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BACKGROUND OF THE INVENTION 



The present application claims priority to co-pending provisional application Serial No. 
60/029,760, filed October 25, 1996, the entire text and figures of which disclosure are 
specifically incorporated herein by reference without disclaimer. The U.S. Government owns 
rights in the present invention pursuant to grant number R01-GM35500 from the National 
Institutes of Health. 

1. Field of the Invention 

The present invention relates generally to the fields of cellular biochemistry and viral 
replication. More particularly, it concerns the discovery that a certain factor, P-TEFb, has a 
central role in transcription elongation control, that it phosphorylates RNA polymerase II and 
that it binds to the HIV protein, Tat. The invention provides human genes encoding the P-TEFb 
subunits, various other biological components, and methods relating to the control of 
transcription elongation that have particular utility in the identification of substances that inhibit 
viral replication. 

2. Description of Related Art 

The production of any functional eukaryotic mRNA requires efficient transcription 
elongation by RNA polymerase II. Eukaryotic gene expression is controlled in part during the 
elongation phase of transcription. Shortly after initiation, RNA polymerase II acquires the 
properties necessary to synthesize full length pre-mRNAs (Spencer and Groudine, 1990b; 
Kerppola and Kane, 1991; Wright, 1993; Bentley, 1995; Maldonado and Reinberg, 1995). As is 
frequently found in control processes, there is a negative control mechanism which is manifest as 
a blockage during early elongation. 

Blocks in transcription, usually referred to as premature termination, have been observed 
during transcription in a number of systems, including mammalian genes such as oc-tubulin 
(Middleton and Morgan, 1990; Hair and Morgan, 1993), and in viruses, such as adenovirus 
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(Kessler et aL, 1989), simian virus 40 (SV40) (Kessler et aL, 1991), minute virus of mice 
(Krauskopf et aL, 1991) and human immunodeficiency virus (HIV) (Laspia et aL, 1989). RNA 
polymerase II molecules are also found blocked, during elongation, near the promoter on many 
genes in Drosophila melanogaster (Rougvie and Lis, 1988; 1990). 

5 

Except for the involvement of the viral Tat protein in HIV gene expression (Marciniak 
and Sharp, 1991), little is known about the molecular mechanisms involved in elimination of this 
block. In vivo, HIV transcription is tightly controlled by the viral Tat protein. It is known that 
Tat acts as a potent transcriptional transactivator by binding to the transactivation response 
10 (TAR) region on the nascent RNA and interacting with cellular factors (Garcia and Gaynor, 
1994; Jones and Peterlin, 1994). Still, many issues remain to be clarified concerning the 
interactions and functional regulation of transcriptional elongation and Tat, and further 
information is needed before effective anti-HIV strategies can be developed based upon 
intervention connected with Tat activity. 

15 

Concerning cellular genes, the transcription of the proto-oncogene, c-myc, is regulated by 
a block during elongation (Miller et aL, 1989; Spencer and Groudine, 1990a; Wright and Bishop, 
1989). C-myc expression was studied in Xenopus oocytes and isolated HeLa nuclei. The block 
to elongation in c-myc occurs close to the promoter, termed "promoter-proximal pausing", and 

20 only short RNAs are produced (Xrumm et aL, 1995; Meulia et aL, 1993; Strobl and Eick, 1992). 
A block to elongation at the end of the first exon regulates the levels of c-fos RNA in response to 
tumor promoters and intracellular calcium levels (Collart et aL, 1991; Mechti et aL, 1991). 
Other blocks to elongation occur in the transcription of the proto-oncogenes c-myb (Bender et aL, 
1987; Reddy and Reddy, 1989) and c-fins (Yue et aL, 1993). The RNA levels for the adenosine 

25 deaminase genes (ADA) of humans and mice are at least partly controlled by a regulated block to 
elongation (Ramamurthy et aL, 1990; Chinsky et aL, 1989; Chen et aL, 1990; Chen et aL, 1991; 
Kash et aL, 1993). Thus this elongation control process has been implicated in the expression of 
many genes, yet the mechanism of control is not yet understood. 
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Studies in human, murine, Drosophila and Xenopus systems have demonstrated the 
existence of two classes of elongation complexes differing in their potential to produce foil 
length mRNA sized transcripts. A model for the control of elongation has been described which 
is based, in part, on results obtained from a Drosophila in vitro transcription system (Kephart et 
5 ai, 1992; Marshall and Price, 1992) and is consistent with data obtained in vitro and in vivo from 
many studies. 

Key features of the elongation control model are that all RNA polymerase II molecules 
that initiate from a promoter are destined to produce only short transcripts, in a process termed 

10 "abortive elongation". Abortive elongation is distinct from abortive initiation because the 
abortive transcripts are 10 to 20 times longer during abortive elongation and, presumably, the 
polymerase in the abortive elongation complexes must relocate the promoter after producing an 
abortive transcript to bring about reinitiation. Escape from this negative control is accomplished 
through the action of P-TEF (positive transcription elongation factor) which allows productive 

15 elongation. Fractionation studies have recently identified three components believed to be 
required to efficiently generate productive elongation complexes, P-TEFa, P-TEFb and factor 2 
(Marshall and Price, 1995). P-TEFb was further purified and was shown to act after initiation 
(Marshall and Price, 1995), although the protein was not subject to detailed biochemical 
characterization. 

20 

The existence of two classes of transcription complexes differing in their elongation 
potential has also been demonstrated using the nucleoside analog, 5,6-dichloro-l-p-D- 
ribofuranosylbenzimidazole (DRB). The addition of DRB to mammalian cells in culture resulted 
in a 95% inhibition in the production of mature mRNA (Sehgal et al 9 1976). Nuclei isolated 
25 from cells pre-treated with DRB have increased production of short, capped transcripts while 
labeling of longer RNAs is decreased (Tamm and Kikuchi, 1979; Tamm et al, 1980). Similarly, 
the short transcripts generated from viral templates in cells infected with SV40 (Laub et aL, 
1980) and adenovirus (Fraser et aL 9 1979) are enhanced, while longer transcripts are suppressed 
with DRB treatment. DRB also inhibits production of long transcripts but leaves shorter 

.4- 
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products unaffected in injected Xenopus oocytes (Meulia et al 9 1993; Roberts and Bentley, 
1992). 

The carboxyl-terminal domain (CTD) of RNA polymerase II is phosphorylated during the 
5 transcription cycle at a time coincident with elongation regulation (Dahmus, 1994; Dahmus, 
1995). The CTD can be phosphorylated by the kinase associated with the general transcription 
factor TFIIH (Lu et al 9 1992; Serizawa et al 9 1992; Feaver et al 9 1991), and a CTD kinase 
activity is believed to be present in preinitiation complexes at several promoters (Peterson et al 9 
1992; Kang and Dahmus, 1993). Research has been directed at identifying the CTD kinase, but 
10 despite the proposal of various candidate kinases, it appears that the relevant kinase has not yet 
been identified. 

A kinase/cyclin pair (SRB10/11) is part of the holoenzyme form of yeast RNA 
polymerase II (Liao et al 9 1995). A number of other kinases, including casein kinase I and II 

15 (Zandomeni et al 9 1986; Cadena and Dahmus, 1987), DNA-dependent protein kinase (Dvir et 
aL, 1992), and a murine kinase related to cdc2 and CDC28 (Cisek and Corden, 1989), are 
capable of phosphorylating the CTD. Also, the kinases CTD-K1 and CTD-K2 purified from 
HeLa cells (Payne and Dahmus, 1993), CTK1 from yeast (Lee and Greenleaf, 1989), and KI, 
KII, and Kill from Aspergillus nidulans (Stone and Reinberg, 1992) can all phosphorylate the 

20 CTD. It has been further suggested that the stress activated MAP kinases are involved in 
phosphorylating RNA polymerase II during heat shock (Venetianer et al 9 1995). While all of the 
above are serine/threonine kinases, there is one example of a tyrosine kinase, c-abl, that can 
phosphorylate the CTD (Baskaran et al 9 1993). 

25 While phosphorylation of the CTD has been correlated with the elongation phase of 

transcription, none of the kinases described above have been shown to modify the functional 
properties of RNA polymerase II during elongation. Therefore, the identity of the kinase that 
operates in this control process remains unknown. The mechanism by which CTD 
phosphorylation induces the transition into productive elongation also remains to be determined, 

30 as does the role in elongation control of various proteins associated with RNA polymerase II in a 
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holoenzyme complex (Koleske and Young, 1995). Although several models for the involvement 
of CTD in elongation control have been proposed (Rasmussen and Lis, 1995), including 
tethering of the polymerase to the promoter by the unphosphorylated CTD, no conclusive 
evidence for one model is available. Further work is still needed to determine the fate of 
5 polymerases that stop early, but do not enter productive elongation (Marshall and Price, 1992), 
and to define the interaction of RNA polymerases in early elongation complexes with 
termination factors, including "factor 2" (Xie and Price, 1996). 



Thus, the role of termination factors and potential anti-termination factors, and the 
10 regulation mechanisms that operate in the elongation control process remain to be clarified. Not 
only will the identification of these factors and their respective properties be of significant 
scientific interest, such discoveries would also have practical values beyond an understanding of 
transcriptional control mechanisms. For example, many viruses produce viral proteins that 
somehow interact with RNA polymerase II and facilitate the production of elongated viral 
15 transcripts, an essential step in the viral life cycle'. Therefore, the identification and 
characterization of protein factors involved in productive elongation will likely yield benefits in 
the development of anti-viral strategies. 



SUMMARY OF THE INVENTION 

20 

The present invention provides novel genes, proteins and related biological compositions 
developed for their ability to interact with RNA polymerase II and to control transcriptional 
elongation. The compositions of the present invention, which are based upon P-TEFb, are 
particularly beneficial as they also functionally interact with viral proteins, such as HIV Tat, that 
25 have central roles in viral transcription elongation. Methods for identifying substances that alter 
transcription elongation are also provided by the invention, with particular emphasis on the 
identification of substances that inhibit the binding or functional interaction of viral proteins and 
P-TEFb, which substances are candidate anti-viral agents. 
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The invention was, in part, initially based upon the inventor's surprising discovery that 
the transcription elongation factor termed P-TEFb phosphorylates RNA polymerase II and 
controls the transition from abortive to productive transcription elongation. The inventor further 
discovered that human P-TEFb binds to the HIV transcriptional transactivating protein, Tat, and 
5 that P-TEFb is the key host cell component that facilitates Tat-mediated viral mRNA elongation 
during HIV infection. From the initial findings, the inventor developed screening assays to 
identify substances that inhibit the interaction of human P-TEFb and viral proteins, such as HIV 
Tat, and will have utility as anti-viral agents. 

10 Certain of the inventor's findings concern the cloning, for the first time, of each of the 

subunits that make up the Drosophila P-TEFb enzyme complex. The Drosophila cloning 
allowed the inventor to make the breakthrough in cloning the full length human counterpart 
P-TEFb subunit genes. The small, kinase subunit of human P-TEFb is herein identified as the 
product of a cDNA for which the nucleic acid sequence was known, but to which no known 

1 5 function had been ascribed. The large, cyclin-like subunit of human P-TEFb has been discovered 
by the present inventor and is disclosed for the first time in the present application. This 
invention therefore provides novel human biological components, including genes, proteins and 
purified holoenzymes, and also new screening methods based upon the use of the human P-TEFb 
enzyme complex. 

20 

P-TEFb is a key regulator of the process controlling the processivity of RNA 
polymerase II. The inventor has shown that P-TEFb can phosphorylate the CTD of pure RNA 
polymerase II. Furthermore, P-TEFb can phosphorylate the CTD of RNA polymerase II when 
the polymerase is in an early elongation complex. Both the function and kinase activity of P- 
25 TEFb are blocked by the drugs DRB and H-8. P-TEFb is distinct from TFIIH because the two 
factors have no subunits in common, P-TEFb is more sensitive to DRB than TFIIH and, most 
importantly, TFIIH can not substitute functionally for P-TEFb. This invention discloses that 
phosphorylation of the CTD by P-TEFb controls the transition from abortive into productive 
elongation mode. 
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Both the human and the Drosophila small, or kinase, subunits are members of the CDC2- 
family of kinases and are cyclin-dependent kinases. The human small subunit associates with the 
activation domain of HIV-1 Tat, indicating that human P-TEFb is the Tat-associated kinase 
(TAK), previously poorly characterized and not purified prior to the present invention. An in 
5 vitro transcription assay demonstrates that the effect of Tat on transcription elongation requires 
P-TEFb and suggests the enhancement of transcriptional processivity by Tat is due to enhanced 
function of P-TEFb. 



The present invention provides DNA segments, vectors and the like comprising at least 
10 one isolated gene, DNA segment or coding region that encodes a Drosophila P-TEFb kinase or 
large subunit protein, polypeptide, domain, peptide or any fusion protein thereof. Further 
provided are at least a first isolated gene, DNA segment or coding region that encodes a human 
P-TEFb large subunit protein, peptide, domain or derivative. 



15 These aspects of the present invention may be described as follows: a DNA segment 

comprising an isolated coding region that encodes a substantially full length P-TEFb subunit, 
wherein the coding region is characterized as: 



a) encoding a substantially full length P-TEFb kinase subunit having the amino acid 
20 sequence of SEQ ID NO:2; or 

b) encoding a substantially full length P-TEFb large subunit that includes a 
contiguous sequence of at least about 7 amino acids from SEQ ID NO:4, SEQ ID 
NO:45, SEQ ID NO:47 or SEQ ID NO:50; or as a substantially full length coding 
region that hybridizes to the nucleotide sequence of SEQ ID NO:3, SEQ ID 

25 NO:43 or SEQ ID NO:48 under stringent hybridization conditions. 



The term "substantially full length" as used herein, means that the genes and coding 
regions of the invention encode a substantially full length P-TEFb kinase or large subunit protein 
or polypeptide such that the subunit produced on expression of the gene or coding region 
30 includes each of the polypeptide regions or domains necessary to impart functional activity to the 
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expressed product. As disclosed herein, even subunits from the same species may vary in length 
and yet still have functional activity. For example, variations of the human P-TEFb large 
subunits are provided by the present invention which have differing lengths and yet still have 
biological activity. In particular, human large subunits of 696, 729 and 726 amino acids in 
5 length are provided hereby. Genes and coding regions that encode a protein of between about 
600 and about 750, or preferably of between about 650 and about 750 amino acids in length are 
thus generally considered to be substantially full length genes or coding regions, although 
smaller genes and coding regions are by no means excluded from the present invention so long as 
they encode a protein that has biological activity when expressed. 

10 

It will also be understood that more variability in length will likely exist between genes 
and coding regions encoding P-TEFb subunits from different species. By way of example only, 
the Drosophila P-TEFb large subunits included within the invention encode a protein of about 
1097 amino acids in length that has biological activity. A comparison of this protein with the 
15 human large subunit proteins disclosed herein reveals that there may be some considerable 
variability in the full length sequences of active subunits from different species. So long as a 
gene encodes a protein subunit that has biological activity as disclosed herein, this will be a 
"substantially full length" gene as this term is presently used. 

20 The P-TEFb kinase subunits provided by and for use in the present invention also 

exemplify the concept of substantially full length and active proteins that may nonetheless have 
certain differences in sequence and actual length. The Drosophila kinase subunit provided 
hereby is about 404 amino acids in length, whereas the human subunit for use in the various 
methods and combined compositions of the invention is about 372 amino acids long. Each of 

25 these proteins have biological activity, as disclosed herein. 

Although the concept of substantially full length sequences will be readily understood by 
those of ordinary skill in the art, another means for assessing the substantially full length nature 
of a gene, coding region or expressed protein of the invention is to analyze the terminal or near- 
30 terminal sequences of the biological components and to confirm that they generally correlate 

-9- 
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with sequences at the termini of the sequences provided herein, or sequences proximal to such 
termini. By conducting such a comparison, one may identify substantially full length genes or 
coding regions that express biologically active proteins with even more variation in length. 

Such a terminal sequence comparison is contemplated to be an effective means for 
identifying substantially full length biological components that may have inserted into their 
sequence additional coding or non-coding sequences (in the context of DNA) or additional 
polypeptide sequences. Working examples of this phenomenon are also provided herein, as a 
comparison of the human P-TEFb large subunit of SEQ ID NO:45 and SEQ ID NO:47 reveals 
that the second encoded subunit contains additional polypeptide sequence that results from 
translation of an apparent intron in the DNA sequence. Genes encoding such introns and 
polypeptides including additional amino acid sequences are clearly encompassed within the 
present invention, and comparison of the substantially terminal sequences is an effective means 
for confirming the identity of longer sequences that are nonetheless P-TEFb coding regions or 
protein subunit s. 

In providing a DNA segment or vector that comprises an isolated gene or coding 
sequence that encodes a P-TEFb kinase subunit, protein or peptide, particularly a Drosophila 
P-TEFb kinase subunit, protein or peptide, the subunit may be generally characterized as: 

a) having an observed molecular weight of between about 42 kD and about 43 kD, 
generally as measured by gel filtration chromatography and SDS-PAGE (sodium 
dodecyl sulfate (SDS) polyacrylamide gel electrophoresis PAGE); and having an 
actual molecular weight of about 47 kD, as calculated from the known protein 
sequence; and optionally characterized as 

b) capable of forming a cyclin kinase pair with the P-TEFb large subunit protein, 
e.g., capable of binding to the Drosophila P-TEFb large subunit protein under 
suitable binding conditions, such as conditions using non-denaturing buffers of 
non-dissociating ionic strength and non-dissociating pH, which buffers generally 
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approximate in effect to the non-denaturing, non-dissociating cellular or nuclear 
conditions in which the P-TEFb protein complex exists in the natural state; and 
optionally characterized as 

being a cyclin-dependent kinase (CDK) that is capable of phosphorylating RNA 
polymerase II, and capable of phosphorylating RNA polymerase II when 
operatively combined with a P-TEFb large subunit protein, e.g., the kinase subunit 
is capable of binding to Drosophila RNA polymerase II in a manner and for a 
period of time effective to catalyze the transfer a phosphate group to RNA 
polymerase II from an available phosphate group donor molecule, such as 
adenosine triphosphate (ATP); wherein the RNA polymerase II is preferably 
phosphorylated on the carboxyl terminal domain (CTD) of the Drosophila large 
subunit of RNA polymerase II; 

(i) more preferably, being identified as a DRB-sensitive cyclin-dependent kinase, 
wherein the phosphorylation of RNA polymerase II by the isolated 
P-TEFb kinase subunit protein, or the P-TEFb kinase subunit protein 
within the P-TEFb enzyme complex, is inhibited by an effective amount of 
DRB, and most preferably, wherein the phosphorylation of RNA 
polymerase II by the P-TEFb kinase subunit protein is inhibited by an 
effective amount of DRB and by an effective amount of H-8; and 
optionally characterized as 

being involved in the control of elongation by RNA polymerase II, and generally 
being capable of promoting transcription elongation when combined with a 
P-TEFb large subunit protein, e.g., capable of promoting proper elongation of 
mRNA transcripts when operatively combined with a P-TEFb large subunit to 
form a P-TEFb enzyme complex and wherein the P-TEFb enzyme complex is 
contacted with a functional Drosophila RNA polymerase II molecule in the 
presence of a DNA template and under conditions otherwise appropriate to result 



in transcription elongation, i.e., in the presence of effective amounts of 
nucleotides, ATP, other necessary co-factors and the like; 

(i) more preferably, wherein the capacity to promote transcription elongation 
5 when operatively combined with a P-TEFb large subunit protein, a 

functional RNA polymerase II and the necessary transcriptional elongation 
components, co-factors and precursors, is inhibited by an effective amount 
ofDRB. 

10 The genes and DNA segments preferably encode a substantially full length P-TEFb 

kinase subunit, protein or polypeptide that includes a contiguous amino acid sequence of at least 
about 24 amino acids, and more preferably, of at least about, 25, 27, 30, 35, 40, 45, 50, 60, 70, 
80, 90, 100, 125, 150, 200 amino acids or so from SEQID NO:2, or a biologically functional 
equivalent thereof. More preferably, the genes and DNA segments will encode a substantially 

15 full length P-TEFb kinase subunit having the amino acid sequence of SEQID NO:2, or a 
biologically functional equivalent thereof. 

Preferably, the isolated genes and coding regions will include a contiguous nucleic acid 
sequence of at least about 722 nucleotides, and more preferably, of at least about 725, 750, 800, 
20 825, 850, 900 or so nucleotides from between position 1 15 and position 1326 or 1327 of SEQ ID 
NO:l, or a biologically functional equivalent thereof. More preferably, the isolated genes and 
DNA segments will comprise an isolated coding region having the nucleic acid sequence of the 
foregoing coding region SEQ ID NO:l, or a biologically functional equivalent thereof. 

25 The human small kinase subunit, SEQ ID NO:6, comprises a single protein called 

"PITALRE", so called because of the presence of those amino acids in a characteristic location in 
the kinase subunit, which is encoded by the sequence presented in SEQ ID NO:5. It is a 
cyclin-dependent kinase of the cell division cycle 2 (CDC2) family of kinases. 
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In providing new combined compositions and new uses for the DNA segments herein 
discovered to encode the human P-TEFb kinase subunit protein (particularly combined 
compositions and uses in connection with DNA segments encoding a P-TEFb large subunit), the 
human P-TEFb kinase subunit gene is generally characterized as follows: 

5 

a) encoding a human P-TEFb kinase subunit, protein or peptide that includes a 
contiguous amino acid sequence of at least about 6, 8, 10, 15, 20 or 24 amino 
acids or so from SEQ ID NO:6, or a biologically functional equivalent thereof; 
and optionally characterized as 

10 

b) including a contiguous nucleic acid sequence of at least about 20-21 nucleotides 
or so, and more preferably, of at least about 30, 40, 50, 60, 72 or so nucleotides 
from the coding region of SEQ ID NO:5, or a biologically functional equivalent 
thereof. 

15 

Preferably, these genes and DNA segments will encode a human P-TEFb kinase subunit 
comprising, consisting essentially of, or having the contiguous amino acid sequence of SEQ ID 
NO: 6, or a biologically functional equivalent thereof; or the genes and DNA segments will 
hybridize thereto under stringent hybridization conditions. These isolated genes and coding 
20 regions will therefore preferably include a contiguous nucleic acid sequence corresponding to 
substantially the full length coding region of SEQ ID NO:5, or a biologically functional 
equivalent thereof; or the genes and DNA segments will hybridize thereto under stringent 
hybridization conditions. Isolated genes and DNA segments having the nucleic acid sequence of 
SEQ ID NO: 5 are just one example. 

25 

In providing new uses for the newly discovered human P-TEFb kinase subunit protein, 
particularly in combination with other P-TEFb subunits, proteins or peptides such that, for 
example, a recombinant human P-TEFb holoenzyme, as in Example 5, is provided, the human 
P-TEFb kinase subunit protein is also generally characterized as follows: 

30 

-13- 
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a) capable of forming a cyclin kinase pair with a human P-TEFb large 
subunit protein, or other large subunit protein, such that it is capable of binding to 
a human P-TEFb large subunit protein under suitable binding conditions, such as 
conditions using non-denaturing buffers of non-dissociating ionic strength and 
non-dissociating pH, which buffers generally approximate in effect to the non- 
denaturing, non-dissociating cellular or nuclear conditions in which the P-TEFb 
protein complex exists in the natural state; 

being a cyclin-dependent kinase (CDK) that is capable of phosphorylating human 
RNA polymerase II, and capable of phosphorylating human RNA polymerase II 
when operatively combined with a human P-TEFb large subunit protein, i.e., the 
kinase subunit is capable of binding to RNA polymerase II in a manner and for a 
period of time effective to catalyze the transfer a phosphate group to RNA 
polymerase II from an available phosphate group donor molecule, such as ATP; 
wherein the human RNA polymerase II is preferably phosphorylated on the 
carboxyl terminal domain (CTD) of the large subunit of RNA polymerase II; 

(i) more preferably, being identified as a DRB-sensitive cyclin-dependent kinase, 
wherein the phosphorylation of human RNA polymerase II by the isolated 
human P-TEFb kinase subunit protein, or the P-TEFb kinase subunit 
protein within the human P-TEFb enzyme complex, is inhibited by an 
effective amount of DRB, and most preferably, wherein the 
phosphorylation of RNA polymerase II by the P-TEFb kinase subunit 
protein is inhibited by an effective amount of DRB and by an effective 
amount of H-8; and 

being involved in the control of elongation by human RNA polymerase II, and 
generally being capable of promoting transcription elongation when combined 
with a human P-TEFb large subunit protein, i.e., capable of promoting proper 
elongation of mRNA transcripts when operatively combined with a human 
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P-TEFb large subunit to form a P-TEFb enzyme complex and wherein the human 
P-TEFb enzyme complex is contacted with a functional human RNA 
polymerase II molecule in the presence of a DNA template and under conditions 
otherwise appropriate to result in transcription elongation, i.e., in the presence of 
effective amounts of nucleotides, ATP, other necessary co-factors and the like; 



10 



(i) more preferably, wherein the capacity to promote transcription elongation 
when operatively combined with a human P-TEFb large subunit protein, a 
functional human RNA polymerase II and the necessary transcriptional 
elongation components, co-factors and precursors, is inhibited by an 



effective amount of DRB. 



v| In certain embodiments, the DNA segments and coding regions may encode Drosophila 

i p" and human P-TEFb kinase subunit peptides, for example from about 25 to about 30 or about 50 

;7l5 amino acids in length or so. Preferably, the DNA segments and coding sequences will encode a 

r B ; Drosophila P-TEFb kinase subunit protein of about 404 amino acids in length; or a human 
P-TEFb kinase subunit protein of about 372 amino acids in length, preferably where the human 

II subunit is used in combination with other DNA segments, such that a recombinant human 

H P-TEFb total enzyme is encoded. 



The DNA segments and vectors of the present invention may comprise an isolated gene 
or coding region that encodes a Drosophila P-TEFb large subunit, protein or peptide. The 
Drosophila P-TEFb large subunit, protein or peptide may be generally characterized as: 



1:20 



25 



a) 



having a molecular weight of about 121 kD, generally as measured by gel 
filtration chromatography and also by calculating the molecular weight from the 
known protein sequence; and optionally characterized as 



b) 



capable of forming a cyclin kinase pair with the Drosophila P-TEFb kinase 
subunit protein, i.e., being capable of binding to the Drosophila P-TEFb kinase 



30 
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subunit protein under suitable binding conditions, such as conditions using non- 
denaturing buffers of non-dissociating ionic strength and non-dissociating pH, 
which buffers generally approximate to the non-denaturing, non-dissociating 
cellular or nuclear conditions in which the P-TEFb protein complex exists in the 
5 natural state; and optionally characterized as 



c) having cyclin protein-like sequence characteristics, such as the conserved cyclin 
box domain; and optionally characterized as 



10 d) capable of promoting transcription elongation when combined with a Drosophila 

P-TEFb kinase subunit protein, i.e., capable of promoting proper elongation of 
mRNA transcripts when operatively combined with a Drosophila P-TEFb kinase 
subunit to form a P-TEFb enzyme complex and wherein the Drosophila P-TEFb 
enzyme complex is contacted with a functional Drosophila RNA polymerase II 

15 molecule in the presence of a DNA template and under conditions otherwise 

appropriate to result in transcription elongation, i.e., in the presence of effective 
amounts of nucleotides, ATP, other necessary co-factors and the like, 



(i) more preferably, wherein the capacity to promote transcription elongation 
20 when operatively combined with a Drosophila P-TEFb kinase subunit 

protein, a functional Drosophila RNA polymerase II and the necessary 
transcriptional elongation components, co-factors and precursors, is 
inhibited by an effective amount of DRB. 



25 Certain genes and DNA segments preferably encode a substantially full length P-TEFb 

large subunit, protein or polypeptide that includes a contiguous amino acid sequence of at least 
about 7 amino acids, or more preferably, of at least about 8, 10, 12, 14, 16, 18, 20, 22 or 25 
amino acids or so from SEQ ID NO:4, or a biologically functional equivalent thereof; or the 
genes and DNA segments will hybridize to such a coding sequence under stringent hybridization 

30 conditions. More preferably, the genes encode a P-TEFb large subunit, protein or peptide that 
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includes a contiguous amino acid sequence of at least about 30, 35, 40, 45, 50, 60, 70, 80, 90, 
100, 125, 150, 200 amino acids or so from SEQ ID NO:4, or a biologically functional equivalent 
thereof; or the genes and DNA segments will hybridize to such a coding sequence under 
stringent hybridization conditions. Most preferably, these genes and DNA segments will encode 
5 a P-TEFb large subunit having the amino acid sequence of SEQ ID NO:4, or a biologically 
functional equivalent thereof; or the genes and DNA segments will hybridize to such a coding 
sequence under stringent hybridization conditions. 

These isolated genes and coding regions may include a contiguous nucleic acid sequence 
10 of at least about 20-21 nucleotides or so, and more preferably, of at least about 30, 40, 50, 60 or 
72 or so nucleotides from between position 716 and position 4054 of SEQ ID NO:3, or a 
biologically functionally equivalent thereof; or the genes and DNA segments will hybridize to 
such a coding sequence under stringent hybridization conditions. Preferably, these isolated genes 
and DNA segments will comprise an isolated coding region having the nucleic acid sequence of 
15 SEQ ID NO:3, or a biologically functional equivalent thereof; or the genes and DNA segments 
will hybridize to such a coding sequence under stringent hybridization conditions. 

Exemplary genes and DNA segments may also be characterized as encoding a 
substantially full length P-TEFb large subunit including a contiguous amino acid sequence of at 
20 least about 7 amino acids, or more preferably, of at least about 8, 10, 12, 14, 16, 18, 20, 22 or 25 
amino acids or so from SEQ ID NO:4, or a biologically functional equivalent thereof; and as 
hybridizing to the nucleic acid sequence of SEQ ID NO:3 under stringent hybridization 
conditions. 

25 In certain embodiments, the isolated DNA segments and coding regions may encode a 

P-TEFb large subunit peptide of from about 15 to about 30 or about 50 amino acids in length or 
so. Preferably, the DNA segments and coding regions encode a Drosophila P-TEFb large 
subunit protein of about 1113 amino acids in length. 
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The present invention provides several human P-TEFb large subunit genes, proteins and 
compositions. Methods of using the various compositions, for example, in the diagnosis and 
treatment of a viral infection, such as HIV, or cancer are also provided. Human P-TEFb large 
subunit proteins and peptides are generally characterized as: 

5 

a) capable of forming a cyclin kinase pair with the human P-TEFb kinase subunit 
protein, Le. 9 being capable of binding to the human P-TEFb kinase subunit protein 
of SEQ ID NO:6 under suitable binding conditions, such as conditions using non- 
denaturing buffers of non-dissociating ionic strength and non-dissociating pH, 

10 which buffers generally approximate to the non-denaturing, non-dissociating 

cellular or nuclear conditions in which the P-TEFb protein complex exists in the 
natural state; and 

b) having cyclin protein-like sequence characteristics, such as the conserved cyclin 
15 box domain; and 

c) being capable of promoting transcription elongation when combined with a 
human P-TEFb kinase subunit protein, i.e., capable of promoting proper 
elongation of mRNA transcripts when operatively combined with a human 

20 P-TEFb kinase subunit to form a human P-TEFb enzyme complex and wherein 

the P-TEFb enzyme complex is contacted with a functional human RNA 
polymerase II molecule in the presence of a DNA template and under conditions 
otherwise appropriate to result in transcription elongation, i.e., in the presence of 
effective amounts of nucleotides, ATP, other necessary co-factors and the like, 

25 

(i) more preferably, wherein the capacity to promote transcription elongation 
when operatively combined with a human P-TEFb kinase subunit protein, 
a functional human RNA polymerase II and the necessary transcriptional 
elongation components, co-factors and precursors, is inhibited by an 
30 effective amount of DRB. 
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The human genes and DNA segments preferably encode a substantially full length human 
P-TEFb large subunit, protein or polypeptide that includes a contiguous amino acid sequence of 
at least about 6 or 7 amino acids or so, or more preferably, of at least about 8, 10, 12, 14, 16, 18, 
5 20, 22 or 25 amino acids or so from SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50, or a 
biologically functional equivalent thereof; or the genes and DNA segments will hybridize to such 
a coding sequence under stringent hybridization conditions. More preferably, the genes encode a 
P-TEFb large subunit, protein or peptide that includes a contiguous amino acid sequence of at 
least about 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200 amino acids or so from SEQ ID 
10 NO:45, SEQ ID NO:47 or SEQ ID NO:50, or a biologically functional equivalent thereof; or the 
genes and DNA segments will hybridize to such a coding sequence under stringent hybridization 
conditions. 

In certain preferred aspects, the human genes and DNA segments of the present invention 
15 will encode a P-TEFb large subunit having the amino acid sequence of SEQ ID NO:45, SEQ ID 
NO:47 or SEQ ID NO:50, or a biologically functional equivalent thereof; or the genes and DNA 
segments will hybridize to such a nucleic acid segment under stringent hybridization conditions. 

The isolated human genes and coding regions may include a contiguous nucleic acid 
20 sequence of at least about 20-21 nucleotides or so, and more preferably, of at least about 30, 40, 
50, 60 or 72 or so nucleotides from the coding region of SEQ ID NO:43 or SEQ ID NO:48, or a 
biologically functionally equivalent thereof; or the genes and DNA segments will hybridize to 
such a coding sequence under stringent hybridization conditions. Preferably, the isolated genes 
and DNA segments will comprise an isolated coding region having the nucleic acid sequence of 
25 SEQ ID NO:44, SEQ ID NO:46 or SEQ ID NO:49, or a biologically functional equivalent 
thereof; or the genes and DNA segments will hybridize thereto under stringent hybridization 
conditions. 

Exemplary human genes and DNA segments may also be characterized as encoding a 
30 substantially full length P-TEFb large subunit including a contiguous amino acid sequence of at 
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least about 7 amino acids, or more preferably, of at least about 8, 10, 12, 14, 16, 18, 20, 22 or 25 
amino acids or so from SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50, or a biologically 
functional equivalent thereof; and as hybridizing to the nucleic acid sequence of SEQ ID NO:43 
or SEQ ID NO:48 under stringent hybridization conditions. 

The present invention also provides DNA segments comprising a first isolated gene or 
coding region that encodes a Drosophila or human P-TEFb kinase subunit, protein or 
polypeptide and a second isolated gene or coding region that encodes a corresponding 
Drosophila or human P-TEFb large subunit, protein or peptide. 

As such, the present invention provides an expression system comprising: 

a) a first expression unit comprising, under the transcriptional control of a promoter, 
a first coding region that encodes a substantially full length P-TEFb kinase 
subunit that includes a contiguous sequence of at least about 7 amino acids from 
SEQ ID NO:2 or SEQ ID NO:6 and 

b) a second expression unit comprising, under the transcriptional control of a 
promoter, a second coding region that encodes a substantially full length P-TEFb 
large subunit that includes a contiguous sequence of at least about 7 amino acids 
from SEQ ID NO:4 ? SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50. 

Such expression systems may comprise a first and second expression unit on a single 
expression vector; or a first and second expression unit on two distinct expression vectors. The 
expression systems may be advantageously comprised within a recombinant host cell. 

Such host cells will generally express a substantially full length P-TEFb kinase subunit 
and a substantially full length P-TEFb large subunit. These cells will therefore produce P-TEFb 
enzyme complexes comprising subunits generally of the same species. However, cross-species 
P-TEFb enzyme complexes, e.g., those that comprise one Drosophila subunit and one human 
subunit, are also contemplated. 
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Isolated, functional P-TEFb enzyme complexes thus form another aspect of the present 
invention. The Drosophila P-TEFb enzyme complex is generally characterized as: 

a) comprising at least two subunits in operative association, a first, kinase subunit 
having a molecular weight of about 42 kD, and a second, large subunit (or cyclin- 
related subunit) having a molecular weight of about 121 kD; 

b) being naturally localized to the cell nucleus and capable of interacting with and 
phosphorylating Drosophila RNA polymerase II present in an early elongation 
complex under suitable cellular conditions, or under suitable in vitro binding 
conditions that effectively duplicate the non-denaturing, non-dissociating cellular 
transcription elongation conditions; 

c) being capable of phosphorylating Drosophila RNA polymerase II; and preferably, 
capable of phosphorylating Drosophila RNA polymerase II on the carboxyl 
terminal domain (CTD) of the large subunit of RNA polymerase II; and more 
preferably, wherein the capacity to phosphorylate Drosophila RNA polymerase II 
is inhibited by an effective amount of DRB; and 

d) being capable of promoting transcriptional elongation, which is also termed 
promoting the transition from an abortive to a productive elongation mode, Le. 9 
being capable of promoting proper or productive elongation of mRNA transcripts 
(rather than abortive elongation) when operatively combined with a functional 
Drosophila RNA polymerase II molecule in the presence of a DNA template and 
under conditions otherwise appropriate to result in transcription elongation, z.e., in 
the presence of effective amounts of nucleotides, ATP, other necessary co-factors 
and the like; and preferably, wherein the capacity to promote transcriptional 
elongation is inhibited by an effective amount of DRB. 

The human P-TEFb enzyme complex is generally characterized as: 
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a) comprising at least two subunits in operative association, a first, kinase subunit 
preferably comprising the sequence of SEQ ID NO:6, and a second, large or 
cyclin-related subunit preferably comprising the sequence of SEQ ID NO:45, 
SEQ ID NO:47 or SEQ ID NO:50; 

b) being naturally localized to the human cell nucleus and capable of interacting with 
and phosphorylating human RNA polymerase II present in an early elongation 
complex under suitable cellular conditions, or under suitable in vitro binding 
conditions that effectively duplicate the non-denaturing, non-dissociating cellular 
transcription elongation conditions; 

c) being capable of phosphorylating human RNA polymerase II; and preferably, 
capable of phosphorylating human RNA polymerase II on the carboxyl terminal 
domain (CTD) of the large subunit of RNA polymerase II; and more preferably, 
wherein the capacity to phosphorylate human RNA polymerase II is inhibited by 
an effective amount of DRB; and 

d) being capable of promoting transcriptional elongation, which is also termed 

promoting the transition from an abortive to a productive elongation mode, Le. 9 
being capable of promoting proper or productive elongation of mRNA transcripts 
(rather than abortive elongation) when operatively combined with a functional 
human RNA polymerase II molecule in the presence of a DNA template and 
under conditions otherwise appropriate to result in transcription elongation, z.e., in 
the presence of effective amounts of nucleotides, ATP, other necessary co-factors 
and the like; and preferably, wherein the capacity to promote transcriptional 
elongation is inhibited by an effective amount of DRB. 

DNA segments and isolated genes may also be manipulated to encode a P-TEFb subunit 
fusion protein or polypeptide construct in which at least one P-TEFb subunit, protein, 
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polypeptide or even peptide is operatively attached to a second coding region that encodes a 
selected peptide or protein sequence. The combination of P-TEFb subunit sequences, including 
human subunits, with selected antigenic amino acid sequences; selected non-antigenic carrier 
amino acid sequences, for use in immunization; selected adjuvant sequences; amino acid 
sequences with specific binding affinity for a selected molecule; amino acid sequences that form 
an active DNA binding or transactivation domain are particularly contemplated. Certain fusion 
proteins may be linked together via a protease-sensitive peptide linker, allowing subsequent easy 
separation. 

The DNA segments intended for use in expression will be operatively positioned under 
the control of, I e. 9 downstream from, a promoter that directs expression of a P-TEFb subunit, 
protein or polypeptide in a desired host cell, such as E. coli, or in certain other preferred 
embodiments in an insect, mammalian or human cell. The promoter may be a recombinant 
promoter, or a promoter naturally associated with P-TEFb. Recombinant vectors, including 
baculoviral vectors, thus form another aspect of the present invention. The recombinant vectors 
may express a Drosophila P-TEFb kinase subunit, protein or polypeptide and a Drosophila 
P-TEFb large subunit, protein or polypeptide. Recombinant vectors expressing human P-TEFb 
large subunit proteins or polypeptides, human P-TEFb kinase subunits in combination with large 
subunits or even human kinase subunits for various uses are also provided. 

Although sequences encoding substantially full length subunits are preferred, the 
invention further provides nucleic acid probes and primers and other nucleic acid segments, 
including those characterized as including: 

a) comprising a sequence region that consists of at least about 120, 150, 200 or so 
contiguous nucleotides that have the same sequence as, or are complementary to, 
about 120, 150, 200 or so contiguous nucleotides selected from any region of SEQ 
IDNO:3;or 
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b) a nucleic acid segment of from about 120, 150, 200 or so to about 20,000 
nucleotides in length that hybridizes to any region of the nucleic acid segment of 
SEQ ID NO:3, or the complement thereof under standard hybridization 
conditions, and particularly under hybridization conditions in which the 
5 Drosophila sequence will not bind to the human kinase subunit sequence. 

However, in defined regions of the sequence, e.g., those with less conservation, segments 
of each of SEQ ID NO:3, or the complement thereof, may variously be about 20, 25, 30, 50, 100, 
200, 500, or 1000 or so nucleotides in length, up to and including full length sequences, or even 
10 longer, as may be achieved by duplication of certain regions. Where the sequence of SEQ ID 
NO:3 is concerned, sequences of at least about 2000, 3000, 4000 nucleotides of SEQ ID NO:3, or 
complements thereof are provided, up to and including the full length sequence of 4328 
'fj contiguous nucleotides of SEQ ID NO:3, or the complement thereof 

;~ 1 5 In addition, the invention provides nucleic acid probes and primers and other nucleic acid 

r£ segments, including those characterized as including: 

% a) comprising a sequence region that consists of at least about 360, 400, 450 or so 

7_ contiguous nucleotides that have the same sequence as, or are complementary to, 

,|;^0 about 360, 400, 450 or so contiguous nucleotides selected from any region of SEQ 

N ID NO:43, SEQ ID NO:46 or SEQ ID NO:48; or 

b) a nucleic acid segment of from about 360, 400, 450 or so to about 20,000 
nucleotides in length that hybridizes to any region of the nucleic acid segment of 
25 SEQ ID NO:43, SEQ ID NO:46 or SEQ ID NO:48, or the complement thereof 

under standard hybridization conditions. 

Again, in certain defined regions of the sequences, e.g. those with less conservation as 
defined herein, segments of each of SEQ ID NO:43, SEQ ID NO:46 or SEQ ID NO:48, or the 
30 complements thereof, may variously be about 20, 25, 30, 50, 100, 200, 500, or 1000 or so 

-24- 

A: 1 23701 (2NG501I.DOC) 



nucleotides in length, up to and including full length sequences, or even longer, as may be 
achieved by duplication of certain regions. 

It will be readily understood that what is meant by "defined regions of the sequence, e.g., 
5 those with less conservation" are contiguous stretches of at least about 20 nucleotides that are not 
identical or complimentary to known nucleic acid sequences. Such defined regions include, for 
example, positions 1-258, 320-345 and 1244-1457 of SEQ ID NO:l; positions 587-964, 1156- 
1711, 1764-3287, 3460-3775 and 3800-4328 of SEQ ID NO:3; and preferably, positions 1-244, 
297-546, 867-1142, 1895-2331, 2821-2890, 3341-3442, 3953-3860 and 4491-4528 of SEQ ID 
10 NO:43; and positions 1-209, 418-667, 919-1031, 2045-2164 and 2219-2360 of SEQ ID NO:48. 

Where the sequence of SEQ ID NO:43, SEQ ID NO:46 or SEQ ID NO:48 is concerned, 
sequences of at least about 2000, 3000, 4000 nucleotides of SEQ ID NO:3, or complements 
thereof are provided, up to and including the full length sequence of 4528 contiguous nucleotides 
15 of SEQ ID NO:43, including the full length sequence of 2190 contiguous nucleotides of SEQ ID 
NO:46 or including the full length sequence of 2360 contiguous nucleotides of SEQ ID NO:48, 
or the complement thereof. 

Any segment may be combined into a DNA segment or vector of up to about 30,000, 
20 about 20,000, or about 15,000 base pairs in length. Segments of up to about 20,000, 15,000, and 
10,000 basepairs in length will generally be preferred, and segments of up to about 5,000 
basepairs in length are also provided. 

The nucleic acids of the present invention may also be DNA segments or RNA segments. 
25 Nucleic acid detection kits are also provided. 

The present invention further provides recombinant host cells comprising at least one 
DNA segment or vector that comprises an isolated gene or coding region that encodes a 
Drosophila or human P-TEFb subunit protein, polypeptide, domain or any fusion protein thereof. 
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Prokaryotic host cells, such as E. coli, are provided, as are eukaryotic host cells, such as insect 
cells and mammalian cells. 

The recombinant host cells may further comprises an operative HIV Tat protein or active 
5 fragment thereof. Such recombinant host cells may be provided with the HIV Tat protein or 
peptide in vitro, for example, to test P-TEFb subunit protein and HIV Tat interactions, or may 
naturally express HIV Tat, including cells provided with P-TEFb subunit proteins, peptides or 
domains in vivo and in vitro. 

10 The recombinant host cells of the present invention preferably have one or more DNA 

segments introduced into them by means of a recombinant vector, and preferably express the 
DNA segment to produce the encoded P-TEFb subunit, protein or polypeptide. The recombinant 
host cells may express a P-TEFb subunit, protein or polypeptide including a contiguous amino 
acid sequence of at least about 7 amino acids from SEQ ID NO:2 or SEQ ID NO:4, and 

15 preferably express a P-TEFb subunit, protein or polypeptide having the amino acid sequence of 
SEQ ID NO:2 or SEQ ID NO:4. 

More preferably, the recombinant host cells will express human P-TEFb large subunits, 
proteins or peptides that include a contiguous amino acid sequence of at least about 7 amino 
20 acids from SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50, and preferably express P-TEFb 
subunits having the amino acid sequence of SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50. 
Cells expressing any of the foregoing large subunits in combination with human P-TEFb kinase 
subunits, proteins or peptides, preferably those that include a contiguous amino acid sequence 
from SEQ ID NO:6, are also provided. 

25 

The recombinant host cells of the present invention may express a P-TEFb subunit, 
protein or polypeptide fusion protein in which a contiguous P-TEFb subunit amino acid sequence 
is operatively attached to a selected peptide or protein sequence. Also provided are recombinant 
host cells that express a Drosophila or human P-TEFb kinase subunit and a Drosophila or human 
30 P-TEFb large subunit. Cells that allow the production of human holoenzymes are preferred. 
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Methods for detecting P-TEFb nucleic acids in cells or samples are also provided, and 
generally comprise obtaining sample nucleic acids from a sample suspected of containing 
P-TEFb nucleic acids, contacting the sample nucleic acids with a nucleic acid segment that 

5 encodes a P-TEFb subunit, protein or polypeptide, preferably a human P-TEFb subunit, protein 
or polypeptide, under conditions effective to allow hybridization of substantially complementary 
nucleic acids, and detecting the hybridized complementary nucleic acids thus formed. Use of 
nucleic acid segments that comprise a discriminating contiguous sequence from the sequence of 
SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:43, SEQ ID NO:48, SEQ ID NO:44, 

0 SEQ ID NO:46 or SEQ ID NO:49 are preferred, with discriminating nucleic acid sequences from 
the sequence of SEQ ID NO:43, SEQ ID NO:48, SEQ ID NO:44, SEQ ID NO:46 or SEQ ID 
NO:49 being more preferred. 



As the P-TEFb enzyme complex is herein shown to be essential to the transcription 
elongation process in eukaryotic, mammalian and human cells, it is envisioned that the levels of 
P-TEFb may correlate with various diseases, such as cancer. Accordingly, the present invention 
further provides methods of determining the levels of P-TEFb in cells or tissue samples, 
including tumor cells and samples, which generally comprise obtaining a biological sample 
suspected of containing P-TEFb; contacting the sample with a biological reagent that detects 
P-TEFb, under conditions effective to allow detection; and determining the level of P-TEFb 
detected. The "biological reagent that detects P-TEFb" may be a nucleic acid segment that 
encodes a human P-TEFb subunit, protein or polypeptide or it may be an antibody that has 
specific binding affinity for a human P-TEFb subunit, protein or polypeptide. 

It is also contemplated that the "type" of P-TEFb, i.e. 9 the presence of one or more 
P-TEFb proteins or mutants thereof, may correlate with diseases, such as cancer. Accordingly, 
the invention even further provides methods of determining whether P-TEFb mutants or one or 
more different P-TEFb proteins are present in cells or tissue samples. The methods generally 
comprise obtaining a biological sample suspected of containing a mutant P-TEFb or different 
P-TEFb protein; contacting the sample with a biological reagent capable of detecting a mutant, 
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distinct from a wild type, P-TEFb or a different, or second, P-TEFb protein, under conditions 
effective to allow differential detection; and determining whether another P-TEFb protein or a 
P-TEFb mutant is present. 

5 The "biological reagent that detects mutant P-TEFb" will generally be a nucleic acid 

segment or gene, or an antibody, that has specificity for the mutant sequence or protein in 
preference to the wild type sequence or protein, allowing effective differentiation between the 
two, as may be used in diagnostic tests for cancer cells or patients. 

10 The "biological reagent that detects different P-TEFb" will generally be a nucleic acid 

segment or gene, or an antibody, that has specificity for another wild type sequence or protein in 
preference to the first wild type sequence or protein, allowing effective differentiation between 
the two, as may be used in diagnostic tests for cancer cells or patients. The use of nucleic acid 
segments, probes or primers that differentiate between the three different human P-TEFb large 

15 subunit proteins provided herein, z.e., proteins having the amino acid sequence of SEQ ID 
NO:45, SEQ ID NO:47 or SEQ ID NO:50, is particularly contemplated. 

The invention therefore also includes the provision of DNA segments, vectors, genes and 
coding sequence regions that encode Drosophila or human P-TEFb proteins, polypeptides, 
20 domains, peptides or any fusion protein thereof, where the P-TEFb protein element comprises at 
least one mutation in comparison to the wild type sequence. The mutation may be deliberately 
introduced by the hand of man, for example, in order to test the function of the changed amino 
acid, e.g., in Tat or RNA polymerase II binding, and/or other functions. The mutation may be 
also be discovered in the natural population, as may be connected with dysfunction or disease. 

25 

Once a correlation between the levels of P-TEFb and a disease, such as cancer, has been 
confirmed, the present invention further provides methods for diagnosing said disease in other 
patients. Diagnostically, the present invention then provides methods for identifying a patient 
having or at risk for developing, for example, cancer; the methods comprising determining the 
30 type or amount of P-TEFb present within a biological sample from the patient, wherein the 
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presence of a type or amount of P-TEFb different to the type or amount of P-TEFb present in a 
corresponding sample from a normal subject, is indicative of a patient having or at risk for 
developing the disease. 

5 Methods of using DNA segments that include an isolated P-TEFb subunit gene or coding 

region, including human DNA segments, are provided, wherein the methods comprise expressing 
a P-TEFb subunit DNA segment in a recombinant host cell and collecting the P-TEFb subunit 
protein, polypeptide, domain or fusion protein expressed by the cell This method may be 
represented by the steps of: 

10 

a) preparing a recombinant vector in which a P-TEFb subunit-encoding DNA segment, 

preferably a human DNA segment, is positioned under the control of a promoter; 

b) introducing the recombinant vector into a recombinant host cell; 

15 

c) culturing the recombinant host cell under conditions effective to allow expression of an 

encoded P-TEFb subunit protein, polypeptide, domain or fusion protein; and 

d) collecting the expressed P-TEFb subunit protein, polypeptide, domain or fusion 
20 protein. 

The invention further provides recombinant P-TEFb subunit polypeptides, proteins and 
fusion proteins, preferably of human origin, prepared at levels that could not previously be 
obtained prior- to the present invention. The methods comprise expressing a gene encoding a 
25 P-TEFb subunit polypeptide, protein or fusion protein in a recombinant host cell and purifying 
the expressed polypeptide, protein or fusion protein away from total recombinant host cell 
components to prepare between about 100 jtxg and about 1000 mg of a recombinant P-TEFb 
subunit polypeptide, protein or fusion protein. With scale up, the inventor contemplates that 
10-fold increases can be achieved yielding up to about 10 g of recombinant P-TEFb proteins. 
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The invention also provides compositions comprising isolated P-TEFb subunit peptides, proteins 
or fusion proteins in all amounts between about 100 ug and about lOOOmg, such as between 
about 500 ug and about 100 mg. 

P-TEFb fusion proteins or constructs including P-TEFb subunit, protein or polypeptide 
sequences operatively attached to distinct, selected amino acid sequences, such as selected 
antigenic amino acid sequences, amino acid sequences with selected binding affinity, and DNA 
binding or transactivation amino acid sequences, are also encompassed within the invention. 
Particularly, P-TEFb subunit, protein or polypeptide sequences operatively attached to 
glutathione-S-transferase amino acid sequences are provided. Fusion proteins with selectably- 
cleavable bonds are also provided. 

Compositions comprising isolated and purified P-TEFb kinase subunit proteins, 
polypeptides or fusion proteins that include a contiguous amino acid sequence of at least about 
24 amino acids, and more preferably, of at least about, 25, 27, 30, 35, 40, 45, 50, 60, 70, 80, 90, 
100, 125, 150, 200 amino acids or so from SEQ ID NO:2, or a biologically functional equivalent 
thereof, are also provided. P-TEFb kinase subunit proteins, polypeptides or fusion proteins 
having the amino acid sequence of SEQ ID NO:2 or a biologically functional equivalent thereof 
are preferred. 

Also provided are isolated and purified P-TEFb large subunit proteins, polypeptides or 
fusion proteins that include a contiguous amino acid sequence of at least about 6 or 7 or so amino 
acids, and more preferably, of at least about, 8, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 45, 50, 60, 
70, 80, 90, 100, 125, 150 or 200 amino acids or so from SEQ ID NO:4, or a biologically 
functional equivalent thereof. P-TEFb large subunit proteins, polypeptides or fusion proteins 
having the amino acid sequence of SEQ ID NO:4 or a biologically functional equivalent thereof 
are more preferred. 



A: 123701(2NG501!,DOC) 



-30- 



Still more preferred aspects of this invention are isolated and purified human P-TEFb 
large subunit proteins, polypeptides or fusion proteins. These components generally include a 
contiguous amino acid sequence of at least about 6 or 7 or so amino acids, and more preferably, 
of at least about, 8, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150 or 
5 200 amino acids or so from SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50, or a biologically 
functional equivalent thereof. P-TEFb large subunit proteins, polypeptides or fusion proteins 
having the amino acid sequence of SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50, or a 
biologically functional equivalent thereof, are even more preferred still. 

10 Further provided are compositions comprising isolated and purified active Drosophila 

P-TEFb enzyme complexes comprising P-TEFb kinase subunits in operative association with 
P-TEFb large subunits, preferably at levels that could not be previously obtained. Further 
provided are compositions comprising isolated and purified active human P-TEFb enzyme 
complexes comprising P-TEFb kinase subunits in operative association with P-TEFb large 

1 5 subunits, preferably at levels that could not be previously obtained. 

The invention further provides compositions comprising an HIV Tat protein in 
combination with an active P-TEFb enzyme complex comprising P-TEFb kinase subunits in 
operative association with P-TEFb large subunits. The complex is preferably a human complex. 

20 

The present invention provides P-TEFb immunodetection reagents. The immunodetection 
reagents may be characterized as: 



a) an antibody that has immunospecificity for a Drosophila P-TEFb kinase subunit 
25 protein, preferably a protein of at least about 7 amino acids from SEQ ID NO:2, 

or more preferably, a protein of SEQ ID NO:2; an antibody that has 
immunospecificity for a Drosophila P-TEFb large subunit protein, preferably a 
protein of at least about 7 amino acids from SEQ ID NO:4, or more preferably, a 
protein of SEQ ID NO:4; or an antibody that has immunospecificity for a human 
30 P-TEFb large subunit protein, preferably a protein of at least about 7 amino acids 
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from SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50 ? or more preferably, a 
protein of SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50; any of which 
antibodies may be operatively attached to a detectable label; or 

b) an antibody that has immunospecificity for a human P-TEFb kinase submit 
protein, preferably a protein of SEQ ID NO:6, wherein the antibody is operatively 
attached to a detectable label 

The immunodetection reagent comprising an antibody that has immunospecificity for a 
human P-TEFb large subunit protein, may further be attached to a detectable label. The 
detectable labels for use in the present invention may be a radioactive label, a fluorescent label, 
biotin, avidin or an enzyme that will generate a detectable product upon contact with a 
appropriate substrate, which is preferably a colored product. The antibodies for use in the 
immunodetection reagents of the present invention are preferably monoclonal antibodies. 

The present invention provides immunodetection kits, certain of which may be described 
as comprising: 

a) a Tat composition comprising a purified HIV Tat protein; 

b) a P-TEFb composition comprising a purified human P-TEFb subunit; and 

c) an immunodetection means. 

The P-TEFb composition may comprise a purified human P-TEFb kinase subunit, a 
purified human P-TEFb large subunit, or a purified human P-TEFb enzyme complex comprising 
a P-TEFb kinase subunit bound to a P-TEFb large subunit. 

The immunodetection kits of the present invention provide a variety of immunodetection 
means. The immunodetection means may be a detectable label that is operatively attached to the 
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HIV Tat protein. In other kits, the immunodetection means may be a first anti-Tat antibody that 
binds to the HIV Tat protein, preferably wherein the first anti-Tat antibody is operatively 
attached to a detectable label. Additionally, the immunodetection means may be a detectable 
label that is operatively attached to the human P-TEFb subunit 

5 

Alternatively, the immunodetection means may be a first anti-P-TEFb antibody that binds 
to a human P-TEFb subunit. The invention further provides kits wherein the first anti-P-TEFb 
antibody binds to human P-TEFb kinase subunits, human P-TEFb large subunits, or to intact 
human P-TEFb enzyme complexes. Further, the first anti-P-TEFb antibody may be operatively 
1 0 attached to a detectable label. 

Certain other kits may comprise a first anti-Tat antibody or a first anti-P-TEFb antibody, 
m wherein the immunodetection means is a detectable label that is operatively attached to a second 
{£ antibody that has binding affinity for the first antibody. In such kits, the HIV Tat protein or the 
f " 1 5 human P-TEFb subunit may be bound to a solid support. 

7 The present invention further advantageously provides methods for identifying genes that 

encode P-TEFb subunit proteins from a desired species based upon the "two hybrid screening 
system". The methods rely on the use of one defined subunit to clone a matching subunit of the 

go 20 desired species, such that one of the binding pair (kinase or large subunit) is known and the other 
" is identified by practicing the method. Accordingly, where one desires to clone a P-TEFb large 
subunit protein one uses the "complementary" P-TEFb kinase subunit protein in the screen. That 
is the meaning of "complementary" as used in the following methodological description. The 
methods may thus be characterized as comprising the steps of: 

25 

a) obtaining a first DNA segment comprising a candidate P-TEFb subunit gene from 
a desired species; the first DNA segment expressing a first fusion protein 
comprising a transcriptional transactivating domain operatively attached to the 
candidate P-TEFb subunit protein encoded by the candidate gene; 

30 
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b) obtaining a second DNA segment that expresses a second fusion protein 
comprising the complementary P-TEFb subunit protein operatively attached to a 
DNA binding domain that binds to a defined nucleic acid sequence; 

c) providing the first and second DNA segments to a eukaryotic host cell that 
comprises a marker gene operatively positioned downstream of the defined 
nucleic acid sequence; and 

d) identifying a eukaryotic host cell that expresses the marker gene, thereby 
identifying the candidate gene as a gene of the desired species that encodes a 
P-TEFb subunit protein. 

The methods generally further comprise isolating the identified candidate P-TEFb subunit 
gene from the first DNA segment within the eukaryotic host cell. U.S. Patent 5,667,973 is 
specifically incorporated herein by reference for the purposes of providing further details 
concerning the execution of screening methods based upon this general technique. 

The transcriptional transactivating domains used in the present invention may be the 
GAL4 or VP 16 transcriptional transactivating domain. The fusion protein may comprise a 
GAL4 DNA binding domain, wherein the defined nucleic acid sequence comprises a GAL4 
binding domain recognition sequence, or a lexA DNA binding domain, wherein the defined 
nucleic acid sequence comprises a lexO binding site sequence. In the methods, the eukaryotic 
host cell may be a yeast host cell (yeast two hybrid system) or a mammalian host cell. 

In the two hybrid system methods of the present invention, marker genes preferred for 
use are chloramphenicol acetyltransferase, (3-galactosidase, green fluorescent protein, 
p-glucuronidase or the luciferase gene, preferably the P-galactosidase gene. 
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A further explanation of the two hybrid system cloning method for identifying a gene of a 
desired species that encodes a P-TEFb subunit protein is that it generally operatively comprises 
the steps of: 



5 a) obtaining a plurality of first DNA segments comprising a plurality of candidate 

genes of the desired species; 

b) obtaining multiple copies of the second DNA segment; 

10 c) providing the plurality of first DNA segments and multiple copies of the second 

DNA segments to a population of eukaryotic host cells in an amount sufficient to 
provide about one first DNA segment and at least about one second DNA segment 
to each host cell in the population; 

15 d) culturing the population of cells under conditions and for a period of time 

effective to allow marker gene expression; and 

e) detecting a host cell from the population that expresses the marker gene, thereby 
identifying the presence in the cell of a first DNA segment that comprises a 
20 candidate gene of the desired species that encodes a P-TEFb subunit protein. 

The method also generally further comprises isolating the detected cell of step (e) free 
from the population of cells, and isolating the candidate gene of the desired species from the first 
DNA segment within the cell. 



25 



Although human P-TEFb-encoding nucleic acids are directly provided by the present 
invention, the invention also enables the identification and use of additional P-TEFb-encoding 
nucleic acid segments of any defined species. Methods for obtaining such sequences generally 
comprise the steps of: 



30 



A: 123701 (2NG50I! DOC) 



-35- 



a) obtaining at least one isolated nucleic acid segment designed to hybridize to a 
P-TEFb subunit gene of the desired species; 

b) contacting a population of nucleic acids of the desired species with the isolated 
nucleic acid segment under conditions effective to allow hybridization of the 
isolated nucleic acid segment to P-TEFb subunit nucleic acids within the 
population of nucleic acids of the desired species; and 

c) identifying a nucleic acid segment of the desired species that hybridizes to the 
isolated nucleic acid segment, preferably free from the population of nucleic acids 
of the desired species. 

The methods of identifying nucleic acids of defined species that encode P-TEFb subunit 
proteins may utilize a single isolated nucleic acid segment designed to hybridize to a P-TEFb 
subunit gene of the desired species, wherein the population of nucleic acids of the desired species 
are contacted with the single isolated nucleic acid segment under conditions in which the single 
isolated nucleic acid segment sequence will bind to the large subunit sequence from the desired 
species. The single isolated nucleic acid segment may be designed from knowledge of the 
Drosophila or human genes provided by the present invention, or may be a probe sequence 
designed from a peptide purified from the desired species. 

The methods of identifying P-TEFb subunit-encoding nucleic acids of a desired species 
may also utilize a pair of isolated nucleic acid segments or probes designed to hybridize to 
spatially distant sequences from a P-TEFb subunit gene of the desired species, wherein a 
polymerase chain reaction is conducted to amplify the P-TEFb subunit gene located between the 
spatially distant sequences, using protocols that are known to those of skill in the art. The probes 
may again be designed from the Drosophila and human sequences of the present invention or 
may be designed from peptide sequences of the desired species. 
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The present invention also provides methods of identifying a nucleic acid segment of a 
desired species that comprises an isolated coding region that encodes a P-TEFb subunit, protein 
or peptide wherein the isolated nucleic acid segment is designed from an analysis of at least one 
peptide sequence obtained from a P-TEFb subunit protein of the desired species. The peptide 
sequence may be an N-terminal or internal peptide sequence. In certain aspects of the present 
invention, the peptide sequence may be obtained by hydrolyzing a purified P-TEFb subunit 
protein from the desired species to form peptides and sequencing a peptide there formed. The 
method of obtaining the peptide sequence may comprise the steps of: 

a) purifying a P-TEFb subunit protein from the desired species, preferably by 
fractionating a cellular extract comprising a P-TEFb subunit protein of the desired 
species and obtaining a fraction enriched for the subunit; 

b) hydrolyzing the purified P-TEFb subunit protein to form a population of peptides; 

c) isolating a single peptide from the population of peptides; and 

d) sequencing the isolated peptide. 

The present invention further provides methods for preparing purified P-TEFb subunit 
proteins of a desired species comprising the steps of: 

a) obtaining a nuclear extract from a cell of the desired species; 

b) subjecting the extract to fractionation via column chromatography, using a series 
of chromatography columns, such as phosphocellulose (P-ll), DEAE cellulose, 
phenyl sepharose, hydroxylapatite, MonoQ, MonoS and the like; and preferably, 
subjecting the extract to fractionation via column chromatography using, in 
sequence, a phosphocellulose, phenyl sepharose, hydroxylapatite, MonoQ and 
MonoS column; and 
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c) obtaining a fraction comprising the P-TEFb subunit protein substantially free 
from other protein components. 

The invention also provides methods of preparing purified P-TEFb subunit proteins of a 
desired species that comprise the steps of: 

a) obtaining a nuclear extract from the desired species; 

b) subjecting the extract to fractionation using an antibody that binds to a known 
P-TEFb subunit protein; 

c) purifying a fraction comprising the desired P-TEFb subunit protein in 
combination with the known P-TEFb subunit protein; and 

d) separating the desired P-TEFb subunit protein free from the known P-TEFb 
subunit protein. 

The antibodies used for preparing purified P-TEFb subunit proteins may be immobilized 
on a solid support, with the extract fractionated by applying the extract to the solid support. 
More preferably, the antibodies are monoclonal antibodies. These antibodies may be prepared by 
immunizing an animal with a P-TEFb subunit, protein or peptide, and collecting the resultant 
antibodies. 

The present invention thus provides nucleic acid segments of any desired species that 
comprise an isolated coding region or gene that encodes a P-TEFb subunit protein of the desired 
species. These nucleic acid segments may also be comprised within a recombinant vector, which 
may be comprised within a recombinant host cell. 
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Compositions comprising purified P-TEFb subunit proteins of any desired species are 
further provided. The compositions may be obtained from cells that naturally express the 
P-TEFb subunits, or obtained from a recombinant cell that has been engineered to express the 
P-TEFb subunit of the desired species. The compositions may be prepared by fractionating a cell 
5 extract comprising a P-TEFb subunit protein and purifying the subunit protein away from total 
cell components. The fractionation may comprise column chromatography, preferably affinity 
column chromatography, and more preferably immunoaffmity column chromatography using a 
column comprising an antibody that binds to the P-TEFb subunit protein of the desired species. 

10 The compositions comprising purified P-TEFb subunit proteins of a desired species may 

further comprise the other P-TEFb subunit protein of the binding pair, and may still further 
comprise a viral protein, such as a species-specific protein equivalent of the HIV Tat protein. 

In particularly useful embodiments, this invention also provides methods of assaying for 
15 P-TEFb. One of the methods comprises testing a composition suspected of containing P-TEFb 
for the ability to phosphorylate RNA polymerase II, wherein phosphorylation is indicative of a 
composition comprising at least a P-TEFb kinase subunit, and preferably is indicative of a 
composition comprising a functional P-TEFb holoenzyme. The method may be described as 
comprising the steps of: 

20 

a) admixing (i)a test composition suspected of containing P-TEFb, (ii) a 
composition comprising at least the carboxyl terminal domain (CTD) of the large 
subunit of RNA polymerase II, and (iii) an effective phosphate donor compound 
comprising a labeled phosphate group; and 

25 

b) determining the ability of the test composition to catalyze the transfer of the 
labeled phosphate group to the carboxyl terminal domain (CTD) of RNA 
polymerase II. 
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The effective phosphate donor compound is preferably ATP that comprises a 32 P- or 

33 

P-labeled terminal phosphate group, although GTP also functions in this regard. 

The invention also provides a phosphorylation-based method for identifying a candidate 
transcriptional inhibitor, comprising preparing a P-TEFb composition comprising at least a 
P-TEFb kinase subunit, and preferably comprising a functional P-TEFb holoenzyme, and testing 
the candidate inhibitor for the ability to inhibit P-TEFb-mediated phosphorylation of RNA 
polymerase II, wherein inhibition of phosphorylation is indicative of a candidate transcriptional 
inhibitor. The method may be described as comprising the steps of: 

a) obtaining a P-TEFb composition comprising at least a P-TEFb kinase subunit, and 
preferably comprising a functional P-TEFb holoenzyme; 

b) obtaining an RNA polymerase II composition comprising at least the carboxyl 
terminal domain (CTD) of the large subunit of RNA polymerase II; 

c) admixing the P-TEFb composition with the RNA polymerase II composition and 
an effective phosphate donor compound comprising a labeled phosphate group; 
and 

d) determining the ability of the P-TEFb composition to transfer the labeled 
phosphate group to the RNA polymerase II composition in the presence of the 
candidate transcriptional inhibitor and in the absence of the candidate 
transcriptional inhibitor, wherein a reduction in the amount of labeled phosphate 
transferred to RNA polymerase II in the presence of the candidate is indicative of 
a positive candidate transcriptional inhibitor. 

The P-TEFb composition may comprise a Drosophila P-TEFb composition or a human 
P-TEFb composition. The P-TEFb composition will preferably comprise a P-TEFb enzyme 
complex that has transcription elongation promoting activity. The terms "a P-TEFb enzyme 



A: 123701(2NG501!.DOC) 



-40- 



complex that has transcription elongation promoting activity", or "a transcriptionally active 
P-TEFb enzyme complex" is a complex, generally comprising at least a kinase subunit and a 
large, cyclin-like subunit, which facilitates transcriptional elongation by RNA polymerase II, or 
removes a previous 'block' that was preventing proper transcriptional elongation, when admixed 
with an otherwise transcriptionally capable or competent composition. The P-TEFb enzyme 
complex may also be termed "transcriptionally active" or "elongationally active". 

The methods for identifying a candidate transcriptional inhibitor may further comprise 
testing the candidate transcriptional inhibitor so identified in a transcription elongation assay, 
wherein inhibition of transcription elongation confirms the identification of a transcriptional 
inhibitor. The transcription elongation assay may be described as comprising the steps of: 

a) preparing a transcriptionally competent composition capable of generating 
elongated RNA transcripts and comprising effective amounts of DNA template, 
P-TEFb enzyme complex, RNA polymerase II, each four nucleotides and ATP; 
and 

b) determining the ability of the transcriptionally competent composition to generate 
elongated RNA transcripts in the presence of the candidate transcriptional 
inhibitor and in the absence of the candidate transcriptional inhibitor, wherein a 
reduction in the amount of elongated RNA or mRNA transcripts in the presence 
of the candidate is indicative of the identification or confirmation of a 
transcriptional inhibitor. 

The invention thus provides transcriptional inhibitors, prepared by a process comprising 
testing a candidate transcriptional inhibitor substance for the ability to inhibit P-TEFb-mediated 
phosphorylation of RNA polymerase II and identifying a transcriptional inhibitor as a candidate 
substance that inhibits the phosphorylation. 
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The invention also provides methods for identifying an HIV Tat protein, comprising 
contacting a composition suspected of containing an HIV Tat protein with a human P-TEFb 
composition under conditions effective to allow the formation of bound protein complexes and 
detecting the bound protein complexes so formed. The human P-TEFb composition may 
comprise a human P-TEFb kinase subunit, preferably comprising the amino acid sequence of 
SEQ ID NO:6. Alternatively, the human P-TEFb composition may comprise a human P-TEFb 
large subunit, preferably comprising the amino acid sequence of SEQ ID NO:45, SEQ ID NO:47 
or SEQ ID NO:50. The human P-TEFb composition may also comprise a human P-TEFb 
complex comprising a P-TEFb kinase subunit and a P-TEFb large subunit. 

Given that the studies of the present inventor demonstrate the general importance of 
human P-TEFb in viral transcription, and show binding to VP16, the invention further provides 
methods for identifying other transcriptional activator proteins, comprising contacting a 
composition suspected of containing a transcriptional activator protein with a human P-TEFb 
composition under conditions effective to allow the formation of bound protein complexes and 
detecting the bound protein complexes so formed. The detection of a protein that binds to the 
human P-TEFb composition is indicative of the identification of a transcriptional activator 
protein. 

Also provided in the present invention is a method for identifying a candidate viral 
transcription inhibitor, comprising testing a candidate substance for the ability to inhibit the 
binding of a viral transcriptional transactivator protein to a P-TEFb composition comprising a 
human P-TEFb subunit under effective binding conditions, wherein inhibition of binding is 
indicative of a positive candidate viral transcription inhibitor. The viral transcriptional 
transactivator protein may be attached to a solid support, as can the P-TEFb composition. The 
viral transcriptional transactivator protein may be an HIV Tat protein, or an adenoviral E1A 
protein or a herpes virus VP 16 protein. 

The P-TEFb composition may comprise a human P-TEFb kinase subunit, preferably 
comprising the amino acid sequence of SEQ ID NO:6. Alternatively, the P-TEFb composition 
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may comprise a human P-TEFb large subunit, preferably comprising the amino acid sequence of 
SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50. In other aspects of the present invention, the 
P-TEFb composition comprises a human P-TEFb complex comprising a P-TEFb kinase subunit 
and a P-TEFb large subunit. 

5 

Such assays are in effect "ELISA type" assays. Accordingly, the effective binding 
conditions are binding and washing conditions, as may be determined by contacting a 
composition comprising the viral transcriptional transactivator protein with the P-TEFb 
composition under conditions effective to allow the formation of bound protein complexes and 

10 detecting the bound protein complexes so formed after removing the non-specifically bound 
protein species. The bound protein complexes may be detected by means of a detectable label 
that is operatively attached to the viral transcriptional transactivator protein, or alternatively by 
means of a first anti-viral antibody that binds to the viral transcriptional transactivator protein, 
preferably wherein the first anti-viral antibody is operatively attached to a detectable label. Also, 

15 the bound protein complexes may be detected by means of a detectable label that is operatively 
attached to the human P-TEFb subunit. 

In other aspects, the bound protein complexes may be detected by means of a first anti- 
P-TEFb antibody that binds to the human P-TEFb subunit, either the human P-TEFb kinase 
20 subunit, the human P-TEFb large subunit or an intact human P-TEFb enzyme complex, 
preferably wherein the first anti-P-TEFb antibody is operatively attached to a detectable label. 

In still further aspects of the present invention, the bound protein complexes may be 
detected by means of a first anti-viral or anti-P-TEFb antibody in combination with a second 
25 antibody, the second antibody having binding affinity for the first antibody and being operatively 
attached to a detectable label. The effective binding conditions may further comprise admixing a 
human cell nuclear extract with the viral transcriptional transactivator protein and the P-TEFb 
composition. 
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The invention provides extended methods of confirming the identification of a candidate 
viral transcription inhibitor, which further comprise testing the positive candidate viral 
transcription inhibitor identified in a viral transcription elongation assay, wherein inhibition of 
viral transcription elongation is indicative of an active candidate viral transcription inhibitor. 
The viral transcription elongation assay may be described as comprising the steps of: 

a) preparing a transcriptionally competent composition capable of generating 
elongated viral RNA transcripts and comprising effective amounts of viral nucleic 
acid template, viral transcriptional transactivator protein, most preferably from the 
corresponding virus, P-TEFb enzyme complex, RNA polymerase II, each of the 
four nucleotides and ATP; and 

b) determining the ability of the transcriptionally competent composition to generate 
elongated viral RNA transcripts in the presence of the positive candidate viral 
transcription inhibitor and in the absence of the positive candidate inhibitor 
candidate, wherein a reduction in the amount of elongated viral RNA or viral 
mRNA transcripts in the presence of the candidate is indicative of the 
identification or confirmation of an active candidate viral transcriptional inhibitor. 

In certain aspects, the method may further comprise testing the active candidate viral 
transcription inhibitor identified in a dual transcription elongation assay, wherein inhibition of 
viral transcription elongation only in the presence of the viral transcriptional transactivator 
protein confirms the identification of a viral transcription inhibitor. The dual transcription 
elongation assay may be described as comprising the steps of: 

a) preparing a first transcriptionally competent composition capable of generating 
elongated human RNA transcripts, the composition comprising effective amounts 
of human nucleic acid template, P-TEFb enzyme complex, RNA polymerase II, 
each of the four nucleotides and ATP; 
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b) preparing a second transcriptionally competent composition capable of generating 
elongated viral RNA transcripts, the composition comprising effective amounts of 
viral nucleic acid template, viral transcriptional transactivator protein, most 
preferably from the corresponding virus, P-TEFb enzyme complex, RNA 
polymerase II, each of the four nucleotides and ATP; and 

c) identifying an active candidate viral transcription inhibitor that inhibits the 
generation of the elongated viral RNA transcripts by the second transcriptionally 
competent composition and that does not inhibit the generation of the elongated 
human RNA transcripts by the first transcriptionally competent composition. 

The second transcriptionally competent composition may alternatively comprise any viral 
template any corresponding viral protein, as exemplified by an adenovirus or a herpes virus 
nucleic acid template and the transcriptional transactivator proteins, adenovirus El A or herpes 
virus VP 16. 

Although the pre-testing or pre-screening for effective candidate inhibitors, using one or 
more of the binding assays described above, is believed to be an effective strategy, there is no 
need that such binding assays be first conducted in order to identify a candidate viral 
transcription inhibitor. Accordingly, the present invention further provides a method for 
identifying a candidate viral transcription inhibitor, comprising testing a candidate substance for 
the ability to inhibit viral RNA elongation in a functional viral transcription elongation assay, 
wherein inhibition of viral RNA elongation is indicative of an active candidate viral transcription 
inhibitor. 

Preferably, anti-viral screening methods comprise testing the active candidate viral 
transcription inhibitors in parallel human and viral transcription elongation assays, wherein the 
presence of the active candidate inhibitor reduces viral, but not human, transcription elongation 
in the parallel assays. 
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Thus, the present invention provides viral transcription inhibitors, prepared by a process 
comprising testing a candidate viral transcription inhibitor substance for the ability to inhibit the 
binding of a viral transcriptional transactivator protein to a human P-TEFb composition and 
identifying a viral transcriptional inhibitor as a candidate substance that inhibits the binding 
under otherwise effective binding conditions. The inhibitors may be dispersed in a 
pharmaceutically acceptable medium, or admixed with a pharmaceutical^ acceptable diluent or 
excipient. 



The invention additionally provides an HIV inhibitor, or pharmaceutical formulation 
1 0 thereof, which may be prepared by a process that comprises the steps of: 



a) preparing a first transcriptionally competent composition capable of generating 
elongated human RNA transcripts, the composition comprising effective amounts 
of human nucleic acid template, P-TEFb enzyme complex, RNA polymerase II, 
1 5 each of the four nucleotides and ATP; 



b) preparing a second transcriptionally competent composition capable of generating 
elongated HIV RNA transcripts, the composition comprising effective amounts of 
HIV nucleic acid template, HIV Tat protein, P-TEFb enzyme complex, RNA 
polymerase II, each of the four nucleotides and ATP; and 

c) identifying an HIV inhibitor that inhibits the generation of elongated HIV RNA 
transcripts by the second transcriptionally competent composition but that does 
not inhibit the generation of elongated human RNA transcripts by the first 
transcriptionally competent composition. 



Also provided by the present invention are methods of inhibiting viral replication, as 
exemplified by HIV replication, comprising contacting a cell suspected of being infected with a 
virus, such as HIV, with an amount of the instant inhibitors effective to inhibit viral, or HIV, 
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RNA elongation in the cell. The cell may be located within an animal, when a therapeutically 
effective amount of the inhibitor is administered to the animal. 

BRIEF DESCRIPTION OF THE PR A WINKS 

The following drawings form part of the present specification and are included to further 
demonstrate certain aspects of the present invention. The invention may be better understood by 
reference to one or more of these drawings in combination with the detailed description of 
specific embodiments presented herein. 

FIG. 1. Requirement oftheCTD for productive elongation. Quantitation of transcription 
by RNA polymerase II treated with chymotrypsin for the indicated times. Transcripts were 
loaded onto a silver stained 6-15% SDS polyacrylamide gel and autoradiographed. 
Autoradiographs were scanned using a BIO-RAD Model GS-670 Imaging Densitometer. Areas 
of the silver stained gel corresponding to subunit Ila, indicative of the intact largest polymerase 
subunit, were quantitated, normalized to the zero (0) digestion time and plotted as CTD 
remaining. The portions of the autoradiograph indicated as runoff and abortive (520 nucleotide 
runoff from the actin promoter) transcripts were quantitated and plotted in parallel. 

FIG. 2A and FIG2B. Inhibition of productive elongation by DRB and H-8. 
Transcription reactions in KgN extract were conducted for 20 min in the presence of the either 
0.7 uM DRB (FIG. 2A) or 7 uM H-8 (FIG. 2B). Quantitation of levels of runoff transcripts were 
determined using a Packard Instantlmager™. Relative levels of runoff at 50% inhibition are 
shown. 

FIG. 3. Comparison of the composition and properties of P-TEFb and TFIIH. 
Quantitation of continuous labeling K C -FT based transcription assays. Dried silver stained, 
6-15% SDS polyacrylamide gels were imaged using the Packard Instantlmager™ and the 
portions of the gel indicated as runoff were quantitated and normalized to the no addition lanes. 
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FIG. 4. Inhibition of processive elongation complexes by DRB and H8. The results of a 
pulse-chase transcription reaction with several DRB and H8 concentrations. The 2 min pulse +/- 
DRB and H8 was followed by a 45 sec. chase. The reactions were analyzed on a 6%-acylamide- 
TBE-urea gel. The quantitation of runoff transcripts from dried polyacrylamide gels were 
imaged using an Packard Instantlmager™ and the portion of the gel indicated as runoff was 
quantitated and expressed as percent (%) of the runoff compared to the control. The values were 
normalized by subtracting the drug-insensitive runoff found at the highest drug concentrations 
used. 

FIG. 5. Effect of Drosophila P-TEFb and DNA titration on transcription. A two min 
pulse/1 min chase was performed using HeLa nuclear extract and DNA template concentrations 
of 5, 10, 20, and 40 ng/ml. Templates were preincubated for 20 min at 30°C in the presence or 
absence of P-TEFb. The transcripts were analyzed on a 6% acrylamide-TBE-urea gel and 
quantitation of labeled runoff transcripts, in CPMs, in the presence of DRB was subtracted from 
the runoff without DRB to obtain DRB-sensitive runoff. 

FIG. 6. Transcription reactions utilized either control depleted HeLa nuclear extract 
(cHNE), or PITALRE depleted extract (dHNE). Baculovirus produced recombinant human 
P-TEFb comprised of PITALRE and cyclin HBL1-1 (rhP-TEFb) was added to reactions as 
indicated. 

FIG. 7A and FIG. 7B. DRB and H-8 inhibition of transcription and human P-TEFb 
activity. Plot of radioactivity in runoff or pol IIo after quantitation using a Packard Instantlmager 
and normalization to the starting amount (100). (FIG. 7A) Transcription of HIV LTR template 
(633-nt runoff). (FIG. 7B) CTD kinase assay using immunoprecipitated human P-TEFb and 
Drosophila RNA polymerase II as substrate described by Price (Price et al, 1987; Price 1995). 
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FIG. 8. Quantitation data for in vitro Tat transactivation using a continuous labeling 
protocol with the indicated templates and extracts and similar reaction mixtures which were 
subjected to a 2 minute pulse and the short transcripts generated were less than 30 nucleotides. 
Amount of runoff or short transcripts for each template/extract combination was normalized to 
5 the corresponding lane with no DRB or Tat added. Runoff transcripts were analyzed in a 6% 
TBE/Urea gel. 



SEQUENCE SUMMARY 



1 0 SEQ ID NO: 1 full length cDNA of the Drosophila P-TEFb small (kinase) subunit 

amino acid sequence of the Drosophila P-TEFb small subunit 
full length cDNA of the Drosophila P-TEFb large subunit 
amino acid sequence of the Drosophila P-TEFb large subunit 
discovered to be the cDNA sequence of human small subunit 
1 5 SEQ ID NO:6 discovered to be the amino acid sequence of human small subunit 

primer for Drosophila RNA polymerase II large subunit 
primer for Drosophila RNA polymerase II large subunit 
degenerate primer for the Drosophila P-TEFb small (kinase) subunit 
degenerate primer for the Drosophila P-TEFb small (kinase) subunit 
20 SEQ ID NO: 1 1 primer for the Drosophila P-TEFb small subunit 

primer for the Drosophila P-TEFb small subunit 
primer for the Drosophila P-TEFb small subunit 
primer for the Drosophila P-TEFb small subunit 
primer for the Drosophila P-TEFb small subunit 
25 SEQ ID NO: 1 6 primer for the Drosophila P-TEFb small subunit 

primer for the Drosophila P-TEFb small subunit 
primer for the Drosophila P-TEFb small subunit 
degenerate primer for the Drosophila P-TEFb large subunit 
degenerate primer for the Drosophila P-TEFb large subunit 
30 SEQ ID NO:2 1 degenerate primer for the Drosophila P-TEFb large subunit 
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SEQ ID NO:22 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:23 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:24 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:25 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:26 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:27 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:28 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:29 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:3 0 primer for the Drosophila P-TEFb large subunit 

SEQ ID NO:3 1 peptide sequence for part of the Drosophila P-TEFb small subunit 

SEQ ID NO:32 peptide sequence for part of the Drosophila P-TEFb small subunit 

SEQ ID NO:33 peptide sequence for part of the Drosophila P-TEFb small subunit 

SEQ ID NO:34 peptide sequence for part of the Drosophila P-TEFb small subunit 

SEQ ID NO:35 peptide sequence for part of the Drosophila P-TEFb large subunit 

SEQ ID NO:36 peptide sequence for part of the Drosophila P-TEFb large subunit 

SEQ ID NO:37 peptide sequence for part of the Drosophila P-TEFb large subunit 

SEQ ID NO:38 peptide sequence for part of the Drosophila P-TEFb large subunit 

SEQ ID NO:39 peptide sequence for part of the Drosophila P-TEFb large subunit 

SEQ ID NO:40 peptide sequence for part of the Drosophila P-TEFb large subunit 

SEQ ID NO:4 1 primer for the human P-TEFb small subunit cDNA 

SEQ ID NO:42 primer for the human P-TEFb small subunit cDNA 

SEQ ID NO:43 full length cDNA of the human P-TEFb large subunit HBL1 

SEQ ID NO:44 coding sequence of human P-TEFb large subunit clone HBL1-1 

SEQ ID NO:45 amino acid sequence of human P-TEFb large subunit clone HBL1 -1 

SEQ ID NO:46 coding sequence of human P-TEFb large subunit clone HBL 1 -2 

SEQ ID NO:47 amino acid sequence of human P-TEFb large subunit clone HBL 1-2 

SEQ ID NO:48 full length cDNA of human P-TEFb large subunit clone HBL3 

SEQ ID NO:49 coding sequence of human P-TEFb large subunit clone HBL3 

SEQ ID NO:50 amino acid sequence of human P-TEFb large subunit clone HBL3 
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DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 



The present invention addresses one or more shortcomings in the prior art through the 
cloning and further characterization of various subunits of the enzyme, termed positive 
5 transcription elongation factor b (P-TEFb), involved in the regulation of RNA elongation. It is 
herein shown that viral processing, through the joint binding of viral proteins and the carboxyl 
terminal domain (CTD) of RNA polymerase II and via the transfer of phosphate groups to RNA 
polymerase II, is P-TEFb dependent. The invention relates particularly to the molecular cloning 
of P-TEFb subunits, including human subunits, to the purification of the recombinant enzyme, to 
10 compounds that are capable of inhibiting xenobiotic activators, in particular viral activators, of 
RNA elongation, and to methods for the identification and use of further inhibitory compounds. 

A certain object of the present invention is therefore to provide methods for obtaining 
P-TEFb enzymes, by purification of the recombinant enzyme from host cells engineered to 
15 express the constituent subunits, which methods are proposed to be generally applicable to the 
purification of all species of P-TEFb. 

It is an additional objective of the invention to provide methods for obtaining these 
enzymes in a relatively purified form, allowing their use in predictive assays for identifying 
20 compounds having the ability to reduce the activity of or inhibit viral-driven RNA elongation 
activity, particularly in the context of retroviruses, such as HIV, and even adenoviruses. 

It is a still further object of the invention to identify classes of compounds which 
demonstrate inhibition of xenobiotic activators of RNA elongation along with a potential 
25 application of these compounds in the treatment of diseases connected with human 
immunodeficiency virus (HIV), herpes simplex virus (HS V) and other viruses and retroviruses. 

I. P-TEFb Functions 

The claimed invention provides a surprising link between the process of productive 
30 elongation and phosphorylation of the CTD of RNA polymerase II. The inventor found that 
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removal of the CTD by limited proteolysis prohibits the transition into productive elongation. In 
correlation with this, the inventor found that a factor required for the transition into productive 
elongation, P-TEFb, is a CTD kinase that, in contrast to earlier candidates, is relevant in intact 
elongation systems. 

P-TEFb is an essential component of the positive transcription elongation system which 
regulates the production of long transcripts in vitro (Marshall and Price, 1992). The subunit 
composition and activity of P-TEFb does not match that of any published Drosophila or human 
transcription factor. In particular, P-TEFb is unlike the elongation factors TFIIF (factor 5) or 
DmS-II because it does not stimulate the elongation rate of purified RNA polymerase on 
dC-tailed templates. P-TEFb was shown to be distinct from Drosophila TFIIH by its function in 
in vitro transcription, CTD kinase activity and polypeptide composition. Yankulov et al. (1995) 
proposed that the effect of DRB and H-8 on transcription was due to inhibition of the kinase 
activity of TFIIH and suggested that the TFIIH associated kinase controlled elongation by RNA 
polymerase II. The possibility of TFIIH playing a role in elongation control remains, but the 
present invention directly demonstrates that P-TEFb functions to control elongation and that 
TFIIH cannot substitute for P-TEFb. 

The conclusions of Yankulov et al. (1995) were based on the observation that DRB or 
H-8 inhibited the TFIIH kinase and the appearance of runoff transcripts during transcription with 
similar dose-response curves. The reported kinase assays were performed at 7.5 uM ATP and 
the transcription assays were performed at 500 uM NTPs. Because different triphosphate 
conditions were used in the two assays a valid comparison of inhibition curves cannot be made. 
It is likely that the DRB-sensitivity detected during transcription was due to inhibition of P-TEFb 
rather than TFIIH, although this conclusion could only be deduced after the discoveries of the 
present invention. 

A slight stimulation of the elongation rate of RNA polymerase II in a dC-tailed template 
assay was seen with some P-TEFb containing fractions, but the activity did not correlate with 
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P-TEFb activity. This activity was chromatographically separated from P-TEFb. Based on 
chromatographic properties, the factor identified by Chodosh et al (1989) in crude fractions 
from HeLa nuclear extract was not P-TEFb, but possibly the human equivalent of factor 2. 

5 Factor 6 was described as a dispensable factor that gave a quantitative stimulation of the 

amount of runoff transcript seen in reconstructions of transcription initiation (Price et al, 1987). 
Factor 6 was a DEAE flowthrough fraction from a 0.75 M HGKEDP (buffer comprising 25 mM 
HEPES, 15% glycerol, gradient concentrations of KC1, 0.1 mM EDTA, 1 mM DTT, 0.1% of a 
saturated solution of phenylmethylsulfonyl fluoride in isopropanol) step of the phosphocellulose 
10 column. The fraction containing factor 6 contained 10 - 15% of the proteins in the nuclear 
extract and was, therefore, very crude. Another stimulatory activity, factor 7 was also found in 
the 0.75 M HGKEDP phosphocellulose fraction and this factor bound to the DEAE (Price et al, 
m 1987). 

W5 Neither of the factors could be further purified at the time because they only had small 

gj stimulatory effects and they could not be found during elution of further chromatographic steps. 
; iff Because both factor 6 and 7 had only small stimulatory effects to RNA elongation, it is unlikely 
Jf that either factor is P-TEFb, but given that a crude column extract was examined, it is reasonable 
U to surmise that, even though P-TEFb was not detected, it was present in the crude extract as were 
J{ 20 potentially hundreds of other proteins. 

The particular chromatographic properties of P-TEFb and the subunit composition of 
P-TEFb distinguish it from any of the known basal initiation factors with the possible exception 
of TFIIJ (Flores et al, 1992). At least one form of IIJ elutes from P-l 1 above 0.5 M KC1 and the 
25 purified factor has two subunits of 33 and 95 kDa. Whereas, P-TEFb elutes from P-l 1 below 
0.5 M KC1 and the subunits of P-TEFb are 46.8 kDa and 121 kDa, respectively, as calculated by 
computer analysis with the genetic analysis program GeneRunner®. A HeLa ss-DNA agarose 
column fraction that contains both IIA and IIJ does not substitute for P-TEFb. 
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P-TEFb or a similar protein may play a role in HIV-1 transcription since productive 
elongation is similar to the stimulation of highly processive elongation complexes by Tat 
(Marciniak and Sharp, 1991). Recently, Tat-SF was identified as a factor required for Tat 
activation of HIV transcription (Zhou and Sharp, 1995). The chromatography of this protein on 
5 anion exchange resins is most similar to factor 2 or P-TEFa, not P-TEFb. The relation of Tat-SF 
to P-TEFb is unclear. Recently, a kinase that can associate with the HIV-1 Tat protein was 
shown to have DRB-sensitive CTD kinase activity (Herrmann and Rice, 1995), but the protein 
was not purified or characterized. 

10 A specific interaction between HIV-1 and HIV-2 Tat proteins with a HeLa cell protein 

kinase, termed Tat-associated kinase (TAK), has been demonstrated (Herrmann and Rice, 1993). 
Tat is a viral protein that acts as a activator of transcription of the HIV viral transcription unit. It 
binds to short nascent HIV transcripts and causes RNA polymerase II to synthesize mRNA sized 
jy transcripts. It has been suggested that a CTD kinase associates with Tat in vivo. It has been 
W5 shown that human cell extracts contain a protein that is able to phosphorylate the CTD of the 
go large subunit of RNA polymerase II (Herrmann and Rice, 1995), but this is a long way from the 
; ■ molecular characterization of "TAK". The TAK activity was inhibited by DRB, which has been 
If shown to selectively inhibit Tat function in vivo and in vitro (Herrmann and Rice, 1995; 
Marciniak and Sharp, 1 99 1 ). 

J[20 

"' 4 Recent work by Yang et al (1996) has shown that Tat proteins specifically associate with 

TAK activity in vivo and require the CTD of RNA polymerase II for function. An unidentified 
42 kDa protein (SEQ ID NO:6) was found to cochromatograph with the TAK activity. They 
speculated that this protein may be a subunit of TAK. The small subunit of P-TEFb has a similar 
25 size to this unidentified protein (Marshall and Price, 1995). Surprisingly, the inventor's 
comparison of the cDNA sequence of the small subunit of P-TEFb (SEQ ID NO:l) with the 
cDNA sequence SEQ ID NO:5 of the 42 kDa protein showed an unexpected 71% identity and 
82% conserved similarity between the two proteins, suggesting to the inventor that the two 
proteins are homologous to each other and that P-TEFb is homologous to TAK. 
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The present invention demonstrates that the human homlog of the small subunit of 
Drosophila P-TEFb binds to a GST-Tat fusion protein, (GST-Tat 1 86R), but not to a GST-Tat 
transactivation defective protein, (GST-Tat 1 86R P18IS) as described in the Examples herein. 
5 The GST-Tat 1 86R/P-TEFb protein complex can phosphorylate the CTD of RNA polymerase 
II, corroborating the conclusion that TAK is the human homolog of P-TEFb. 

Phosphorylation of the CTD of the large subunit of RNA polymerase II normally occurs 
during the late stages of initiation or early during elongation (Dahmus, 1994; Dahmus, 1995). 
10 An intact CTD is required for Drosophila elongation complexes to escape early blocks to 
elongation and enter productive elongation. The serine and threonine rich CTD is a substrate for 
a large number of kinases (Dahmus, 1994). However, those working in this field were, prior to 
the present invention, unable to determine which kinase(s) actually phosphorylates the CTD in 
"ft vivo with functional consequences. 
HI 5 

dj Recently, Tat-SF was proposed as a factor required for Tat activation of HIV 

transcription (Zhou and Sharp, 1995). Based on the chromatographic analysis of Tat-SF, it 
t: appears to be most similar to factor 2 or P-TEFa, two factors which stimulate RNA elongation 
H but do not appear to be essential for RNA elongation to occur. The present invention is therefore 
.1.20 surprising in that it clarifies the confusion in the prior art by providing a purified kinase which is 
~~ 4 essential for the transition into productive RNA elongation. This protein, named P-TEFb, has 
some homology to cyclin proteins although this similarity could not have been predicted prior to 
cloning and phosphorylating the CTD of RNA polymerase II. 

25 The present invention shows that P-TEFb is distinct from a number of CTD kinases and 

in vivo studies of the present inventor demonstrate that the transition into productive elongation 
is controlled by P-TEFb-dependent phosphorylation of the CTD. Subunit composition, DRB 
sensitivity and functional properties distinguish P-TEFb from TFIIH or the TFIIH associated 
kinase. P-TEFb functions during elongation and is not stably associated with the transcription 

30 complex at any time (Kephart et al, 1992; Marshall and Price, 1992; Marshall and Price, 1995). 
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This further discriminates between P-TEFb and TFIIH as well as the kinase associated with the 
SRB complex and DNA PK which are part of the initiation complex or otherwise bound to the 
template. The subunit composition and chromatographic properties of P-TEFb are not similar to 
any known kinase. Anti-phosphotyrosine antibodies failed to react with RNA polymerase II 
5 phosphorylated by P-TEFb, indicating that P-TEFb is a serine/threonine kinase. 

The CTD is required for transcription in vivo (Zehring et aL, 1988; Nonet et aL, 1987; 
Bartolomei et aL, 1988; Allison et aL, 1988; Brickey and Greenleaf, 1995), but demonstrating a 
general requirement for the CTD in vitro requires the use of crude extracts (Li and Kornberg, 
10 1994) or a more purified system which includes specific elongation control factors (Marshall and 
Price, 1995). However, in more purified systems lacking the elongation control factors some 
promoters still demonstrate a requirement for the CTD. The CTD is not required for 
^ transcription from the Drosophila actin 5C promoter in a Drosophila system (Zehring et aL, 
fi 1988) or from the Ad-2 ML promoter in a HeLa system (Kim and Dahmus, 1989). The CTD is 
W5 required for transcription from the murine DHFR promoter in a HeLa system (Kang and 
03 Dahmus, 1993; Thompson et aL, 1989; Akoulitchev et aL, 1995). Akoulitchev et aL (1995), 
^ showed that an early step in initiation from the DHFR promoter (formation of the first 
H; phosphodiester bond) required the CTD. 

J 20 A model for the control of elongation has been described which is based on results 

^ obtained from a Drosophila in vitro transcription system (Kephart et aL, 1992; Marshall and 
Price, 1992) and is consistent with data obtained in vitro and in vivo from many studies. Key 
features of the model are that all RNA polymerase II molecules that initiate from a promoter are 
destined to produce only short transcripts in a process termed abortive elongation. Abortive 
25 elongation is distinct from abortive initiation because the abortive transcripts are 10 to 20 times 
longer during abortive elongation and presumably the polymerase in the abortive elongation 
complexes must relocate the promoter after producing an abortive transcript to bring about 
reinitiation. Escape from this negative control is accomplished through the action of the positive 
transcription elongation system which allows productive elongation. 

30 
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Elongation control can be divided into two temporally differentiated stages during which 
specific elongation factors function. The transition into productive elongation occurs during the 
first stage. Negative transcription elongation factors (N-TEF) influence RNA polymerase II 
molecules that have initiated from a promoter causing them to enter abortive elongation which is 
5 characterized by the generation of short transcripts (Kephart et al, 1992; Marshall and Price, 
1992). Some of the components of N-TEF seem to be associated with the preinitiation complex 
(Marshall and Price, 1992), but one component, factor 2, acts after initiation as an RNA 
polymerase II transcript release factor (Xie and Price, 1996). The generation of highly 
processive elongation complexes involves several P-TEFs (Marshall and Price, 1995). 

10 

The second stage of the elongation control process occurs after RNA polymerase II has 
entered productive elongation at which point the polymerase may be further affected by other 
factors. The first factor identified, S-II, suppresses pausing by RNA polymerase II at specific 
sites (Reinberg and Roeder, 1987; SivaRaman et al, 1990; Sluder et al, 1989) through a 

15 transcript cleavage mechanism (Guo and Price, 1993; Izban and Luse, 1993; Reines, 1992). The 
required initiation factor TFIIF (Drosophila factor 5 or RAP 30/74), also stimulates the 
elongation rate of RNA polymerase II (Bengal et al, 1991; Burton et al, 1988; Flores et al, 
1989; Kephart et al, 1994; Price et al, 1989). Other factors, TFIIX (Bengal et al, 1991), S-III 
(Bradsher et al, 1993a; Bradsher et al, 1993b), and ELL (Shilatifard et al, 1996) stimulate 

20 elongation in a manner similar to TFIIF. 

A number of Drosophila genes including HSP70 have early blocked polymerases in the 
HA form, while downstream elongating polymerases are highly phosphorylated (O'Brien et al, 
1994). In injected Xenopus oocytes, GAL4 based activators stimulated elongation in a DRB- 

25 sensitive manner (Yankulov et al, 1994). It is possible that transcriptional activators work with 
or through P-TEFb. Dubois et al (1994) found that addition of DRB or H-8 to HeLa cells 
caused a rapid decrease in the amount of RNA polymerase Ho. Egyhazi et al (1996) recently 
showed that DRB inhibited the phosphorylation of RNA polymerase II to the Ho form in 
Chironomus tentans. Their results indicated that addition of DRB immediately stopped the 

30 incorporation of phosphate into the polymerase, but that polymerases in productive elongation 
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complexes maintained their hyperphosphorylated states until they completed their current round 
of transcription. 

The substrate specificity of P-TEFb and the kinetics of P-TEFb to phosphorylate CTD 
5 were determined under a variety of conditions as described herein in the Examples. The 
nucleotide requirements of P-TEFb CTD kinase were similar to those reported for HeLa TFIIH 
(Lu et aL 9 1992), CTD-K1 (Payne and Dahmus, 1993), and rat liver 5 (Serizawa et aL, 1992), but 
were different from those for yeast factor b (Feaver et al, 1991) and CTD-K2 (Payne and 
Dahmus, 1993). The kinetics of phosphorylation using different concentrations of kinase is not 
10 consistent with processive P-TEFb action during the phosphorylation of purified RNA 
polymerase II. P-TEFb does prefer to phosphorylate a CTD that has already been 
phosphorylated which may have implications for its function after initiation. 

?5 II. P-TEFb Genes and Expression 
hi 5 A. DNA Segments 

w Important aspects of the present invention concern isolated DNA segments and 

recombinant vectors encoding P-TEFb, and the creation and use of recombinant host cells 

jl through the application of DNA technology, that express P-TEFb. DNA segments, recombinant 

U vectors, recombinant host cells and expression methods using sequences of the P-TEFb small 

^-20 subunit and large subunit are also provided. 

Each of the foregoing genes are included within all aspects of the following description. 
The present invention concerns DNA segments, isolatable from insect, mammalian and human 
cells, that are free from total genomic DNA and that are capable of expressing a P-TEFb protein 
25 or polypeptide subunit that has kinase activity, or that has similarity to cyclin proteins, both of 
which subunits are essential for productive RNA elongation. Such proteins will herein be termed 
M P-TEFb proteins". 



-58- 

A: 123701 (2NG50M.DOC) 



As used herein, the term "DNA segment" refers to a DNA molecule that has been isolated 
free of total genomic DNA of a particular species. Therefore, a DNA segment encoding a 
P-TEFb subunit refers to a DNA segment that contains P-TEFb coding sequences yet is isolated 
away from, or purified free from, total mammalian or human genomic DNA. Included within the 
5 term M DNA segment", are DNA segments and smaller fragments of such segments, and also 
recombinant vectors, including, for example, plasmids, cosmids, phage, viruses, and the like. 

Similarly, a DNA segment comprising an isolated or purified P-TEFb kinase or large 
subunit protein gene refers to a DNA segment including P-TEFb kinase or large subunit protein 
10 coding sequences and, in certain aspects, regulatory sequences, isolated substantially away from 
other naturally occurring genes or protein encoding sequences. In this respect, the term "gene" is 
used for simplicity to refer to a functional protein, polypeptide or peptide encoding unit. As will 
be understood by those in the art, this functional term includes both genomic sequences, 
)0 complementary DNA (cDNA) sequences and smaller engineered gene segments that express, or 
M5 may be adapted to express, P-TEFb proteins, polypeptides, domains, peptides, fusion proteins 
m and mutants. 

H 8 "Isolated substantially away from other coding sequences" means that the gene of 

M= interest, in this case a P-TEFb kinase or large subunit protein gene, forms the significant part of 
%£0 the coding region of the DNA segment, and that the DNA segment does not contain large 
^ portions of naturally-occurring coding DNA, such as large chromosomal fragments or other 
functional genes or cDNA coding regions. Of course, this refers to the DNA segment as 
originally isolated, and does not exclude genes or coding regions later added to the segment by 
the hand of man. 

25 

In particular embodiments, as described in detail in the foregoing 'Summary' section, the 
invention concerns isolated DNA segments and recombinant vectors that encode substantially 
full length P-TEFb kinase subunit proteins or polypeptide that include a contiguous amino acid 
sequence of at least about 24 amino acids, and more preferably, of at least about, 25, 27, 30, 35, 
30 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200 amino acids or so from SEQID NO:2, or a 
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biologically functional equivalent thereof. Even more preferably, the genes encode a protein 
with a sequence essentially as set forth in SEQ ID NO:2. 

Substantially full length P-TEFb kinase subunit genes preferably include a contiguous 
5 nucleic acid sequence of at least about 722 nucleotides, and more preferably, of at least about 
725, 750, 800, 825, 850, 900 or so nucleotides from between position 115 and position 1326 or 
1327 of SEQ ID NO:l, or a biologically functional equivalent thereof. More preferably, the 
isolated genes and DNA segments have a nucleic acid sequence essentially as set forth in 
SEQ ID NO:l. 

10 

The substantially full length P-TEFb large subunit genes of the invention generally 
encode a protein or polypeptide that includes a contiguous amino acid sequence of at least about 
m 6 or 7 amino acids or so, or more preferably, of at least about 8, 10, 12, 14, 16, 18, 20, 22 or 
j| 25 amino acids or so from SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50, or 
Hi 5 a biologically functional equivalent thereof; or the genes and DNA segments hybridize to such a 
m coding sequence under stringent hybridization conditions. 

More preferably, the substantially full length P-TEFb large subunit genes encode a 
M P-TEFb large subunit protein that includes a contiguous amino acid sequence of at least about 
J[20 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200 amino acids or so from SEQ ID NO:4, 
^ SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50, or a biologically functional equivalent 
thereof; or hybridize to such a coding sequence under stringent hybridization conditions. Most 
preferably, the substantially full length P-TEFb large subunit genes encode a P-TEFb large 
subunit having a sequence essentially as set forth in the amino acid sequence of SEQ ID NO:4, 
25 SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50, or a biologically functional equivalent 
thereof; or hybridize to such a coding sequence under stringent hybridization conditions. 

In certain embodiments, the substantially full length P-TEFb large submit genes include 
a contiguous nucleic acid sequence of at least about 120, 150 or 200 or so nucleotides from 
30 between position 716 and position 4006 of SEQ ID NO:3, or include a contiguous nucleic acid 
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sequence of at least about 360, 400, 450 or 500 or so nucleotides from a coding region from 
SEQ ID NO:43 or SEQ ID NO:48 (e.g., SEQ ID NO:44, SEQ ID NO:46 or SEQ ID NO:49); or a 
biologically functionally equivalent thereof; or will hybridize to such a coding sequence under 
stringent hybridization conditions. 

5 

Most preferably, the substantially full length P-TEFb large subunit genes comprise an 
isolated coding region having a nucleic acid sequence essentially as set forth in SEQ ID NO:3, 
SEQ ID NO:44, SEQ ID NO:46 or SEQ ID NO:49, or a biologically functional equivalent 
thereof; or will hybridize to such a coding sequence under stringent hybridization conditions. 

10 

The term "a sequence essentially as set forth in SEQ ID NO:2, SEQ ID NO:4, SEQ ID 
NO:45, SEQ ID NO:47 or SEQ ID NO:50" means that the sequence substantially corresponds to 
a portion of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50 
and has relatively few amino acids that are not identical to, or a biologically functional 

1 5 equivalent of, the amino acids of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 
or SEQ ID NO:50. The term "biologically functional equivalent" is well understood in the art 
and is further defined in detail herein. Accordingly, sequences that have between about 85% and 
about 90%; or more preferably, between about 91% and about 95%; or even more preferably, 
between about 96% and about 99%; of amino acids that are identical or functionally equivalent 

20 to the amino acids of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID 
NO:50 will be sequences that are "essentially as set forth in SEQ ID NO:2, SEQ ID NO:4, SEQ 
ID NO:45, SEQ ID NO:47 or SEQ ID NO:50", provided the biological activity of the protein is 
maintained. 

25 In certain other embodiments, the invention concerns isolated DNA segments and 

recombinant vectors that include within their sequence a nucleic acid sequence essentially as set 
forth in SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:43, SEQ ID NO:46 or SEQ ID NO:48. The 
term "essentially as set forth in SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:43, SEQ ID NO:46 or 
SEQ ID NO:48" is used in the same sense as described above and means that the nucleic acid 

30 sequence substantially corresponds to a portion of SEQ ID NO:l , SEQ ID NO:3, SEQ ID NO:43, 
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SEQ ID NO:46 or SEQ ID NO:48 and has relatively few codons that are not identical, or 
functionally equivalent, to the codons of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:43, SEQ ID 
NO:46 or SEQ ID NO:48. Again, DNA segments that encode proteins exhibiting kinase activity 
or large functional subunits will be most preferred. 

The term "functionally equivalent codon" is used herein to refer to codons that encode the 
same amino acid, such as the six codons for arginine or serine, and also refers to codons that 
encode biologically equivalent amino acids (see Codon Table, below). 



CODON TABTiR 



Amino Acids 


Codons 


Alanine 


Ala 


A 


GCA 


GCC 


GCG 


GCU 






Cysteine 


Cys 


C 


UGC 


UGU 










Aspartic acid 


Asp 


D 


GAC 


GAU 










Glutamic acid 


Glu 


E 


GAA 


GAG 










Phenylalanine 


Phe 


F 


UUC 


uuu 










Glycine 


Gly 


G 


GGA 


GGC 


GGG 


GGU 






Histidine 


His 


H 


CAC 


CAU 










Isoleucine 


He 


I 


AUA 


AUC 


AUU 








Lysine 


Lys 


K 


AAA 


AAG 










Leucine 


Leu 


L 


UUA 


UUG 


CUA 


cue 


CUG 


CUU 


Methionine 


Met 


M 


AUG 












Asparagine 


Asn 


N 


AAC 


AAU 










Proline 


Pro 


P 


CCA 


CCC 


CCG 


ecu 






Glutamine 


Gin 


Q 


CAA 


CAG 










Arginine 


Arg 


R 


AGA 


AGG 


CGA 


CGC 


CGG 


CGU 


Serine 


Ser 


S 


AGC AGU 


UCA 


UCC 


UCG 


UCU 


Threonine 


Thr 


T 


ACA 


ACC 


ACG 


ACU 






Valine 


Val 


V 


GUA 


GUC 


GUG 


GUU 






Tryptophan 


Trp 


W 


UGG 












Tyrosine 


Tyr 


Y 


UAC 


UAU 











It will also be understood that amino acid and nucleic acid sequences may include 
additional residues, such as additional N- or C-terminal amino acids or 5' or 3' sequences, and 
yet still be essentially as set forth in one of the sequences disclosed herein, so long as the 
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sequence meets the criteria set forth above, including the maintenance of biological protein 
activity where protein expression is concerned. The addition of terminal sequences particularly 
applies to nucleic acid sequences that may, for example, include various non-coding sequences 
flanking either of the 5' or 3' portions of the coding region or may include various internal 
5 sequences, i. e. , introns, which are known to occur within genes. 

Excepting intronic or flanking regions, and allowing for the degeneracy of the genetic 
code, sequences that have between about 75% and about 79%; or more preferably, between about 
80% and about 89%; or even more preferably, between about 90% and about 99%; of nucleotides 
10 that are identical to the nucleotides of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:43, SEQ ID 
NO:46 or SEQ ID NO:48 will be sequences that are "essentially as set forth in SEQ ID NO:l, 
SEQ ID NO:3, SEQ ID NO:43, SEQ ID NO:46 or SEQ ID NO:48". 

Sequences that are essentially the same as those set forth in SEQ ID NO: 1, SEQ ID NO:3, 
15 SEQ ID NO:43, SEQ ID NO:46 or SEQ ID NO:48 may also be functionally defined as 
sequences that are capable of hybridizing to a nucleic acid segment containing the complement 
of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:43, SEQ ID NO:46 or SEQ ID NO:48 under 
relatively stringent conditions. Suitable relatively stringent hybridization conditions will be well 
known to those of skill in the art. 

20 

Suitable standard hybridization conditions for the present invention include, for example, 
hybridization in 50% formamide, 5x Denhardts' solution, 5x SSC, 25 mM sodium phosphate, 
0.1% SDS and 100 ug/ml of denatured salmon sperm DNA at 42°C for 16 h followed by lh 
sequential washes with O.lx SSC, 0.1% SDS solution at 60°C to remove the desired amount of 
25 background signal. Lower stringency hybridization conditions for the present invention include, 
for example, hybridization in 35% formamide, 5x Denhardts' solution, 5x SSC, 25 mM sodium 
phosphate, 0.1% SDS and 100 ug/ml denatured salmon sperm DNA or E. coli DNA at 42°C for 
16 h followed by sequential washes with 0.8x SSC, 0.1% SDS at 55°C. Those of skill in the art 
will recognize that conditions can be readily adjusted to obtain the desired level of stringency. 
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Naturally, the present invention also encompasses DNA segments that are 
complementary, or essentially complementary, to the sequence set forth in SEQ ID NO:l, SEQ 
ID NO:3, SEQ ID NO:43, SEQ ID NO:46 or SEQ ID NO:48. Nucleic acid sequences that are 
5 "complementary" are those that are capable of base-pairing according to the standard Watson- 
Crick complementarity rules. As used herein, the term "complementary sequences" means 
nucleic acid sequences that are substantially complementary, as may be assessed by the same 
nucleotide comparison set forth above, or as defined as being capable of hybridizing to the 
nucleic acid segment of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:43, SEQ ID NO:46 or SEQ 
10 ID NO:48 under relatively stringent conditions such as those described immediately above. 

The nucleic acid segments of the present invention, regardless of the length of the coding 
sequence itself, may be combined with other DNA sequences, such as promoters, 
polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding 
15 segments, and the like, such that their overall length may vary considerably. It is therefore 
contemplated that a nucleic acid fragment of almost any length may be employed, with the total 
length preferably being limited by the ease of preparation and use in the intended recombinant 
DNA protocol. 

20 For example, nucleic acid fragments may be prepared from any region of the presently 

disclosed sequences that include contiguous sequences characterized as follows: about 117-125 
or about 126-135 nucleotides of SEQ ID NO:3; about 315-320 or about 321-330 nucleotides of 
SEQ ID NO:48; about 360-370 or about 371-380 nucleotides of either SEQ ID NO:43 or SEQ ID 
NO:46. The sequences may be up to about 30,000 or 20,000, or about 10,000, or about 5,000 

25 base pairs in length, with segments of about 3,000 being preferred in certain cases. 

DNA segments with total lengths of about 1,000, about 500, about 300, about 200, and 
about 150 base pairs in length (including all intermediate lengths) are also contemplated to be 
useful for certain sequences as described above. For the smaller fragments, probes and primers 
30 the sequences will generally be selected from the following defined regions: between positions 
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1-258, 320-345 and 1244-1457 of SEQ ID NO:l; positions 587-964, 1156-1711, 1764-3287, 
3460-3775 and 3800-4328 of SEQ ID NO:3; and preferably positions, 1-244, 297-546, 867- 
1142, 1895-2331, 2821-2890, 3341-3442, 3953-3860 and 4491-4528 of SEQ ID NO:43; and 
positions 1-209, 418-667, 919-1031, 2045-2164 and 2219-2360 of SEQ ID NO:48. 

5 

It will be readily understood that "intermediate lengths", in these contexts, means any 
length between the quoted ranges, bearing in mind that the stated constraints on ht elengths and 
regions of complementarity. For example, 19, 20, 21, etc.; 24, 25, 26, etc.; 29, 30, 31, etc.; 48, 
49, 50, 51, etc.; 75, 76, 77, 78, 79, 80 etc.; 100, 101, 102, 103 etc.; 118, 119, 120, 121 etc.; 127, 
10 128, 129, 130, 131, etc.; 316, 317, 318, 319, etc.; 322, 323, 324, 325, 326, etc.; 361, 362, 363, 
364, etc.; 372, 373, 374, 375, etc.; including all integers through the 400-500; 500-1,000; 1,000- 
2,000; 2,000-3,000; 3,000-5,000; 5,000-10,000 ranges, up to and including sequences of about 
12,001, 12,002, 13,001, 13,002, 15,000, 20,000, 30,000 and the like as is appropriate for each 
sequence. 

15 

The various probes and primers designed around the disclosed nucleotide sequences of the 
present invention may be of any length. By assigning numeric values to a sequence, for example, 
the first residue is 1, the second residue is 2, etc., an algorithm defining all primers can be proposed: 

20 n to n + y 

where n is an integer from 1 to the last number of the sequence and y is the length of the 
primer minus one, where n + y does not exceed the last number of the sequence. Thus, for a 
125-mer, the probes correspond to bases 1 to 125, 2 to 126, 3 to 127 ... and so on. For a 315-mer, 
25 the probes correspond to bases 1 to 315, 2 to 3 16, 3 to 3 17 ... and so on. For a 360-mer, the probes 
correspond to bases 1 to 360, 2 to 361, 3 to 362 ... and so on. 

It will also be understood that this invention is not limited to the particular nucleic acid 
and amino acid sequences of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:43, SEQ ID NO:46 or 
30 SEQ ID NO:48. Recombinant vectors and isolated DNA segments may therefore variously 
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include these coding regions themselves, coding regions bearing selected alterations or 
modifications in the basic coding region, or they may encode larger polypeptides that 
nevertheless include such coding regions or may encode biologically functional equivalent 
proteins or peptides that have variant amino acids sequences. 

5 

The DNA segments of the present invention encompass biologically functional 
equivalent P-TEFb and kinase proteins and peptides. Such sequences may arise as a 
consequence of codon redundancy and functional equivalency that are known to occur naturally 
within nucleic acid sequences and the proteins thus encoded. Alternatively, functionally 

10 equivalent proteins or peptides may be created via the application of recombinant DNA 
technology, in which changes in the protein structure may be engineered, based on 
considerations of the properties of the amino acids being exchanged. Changes designed by man 
may be introduced through the application of site-directed mutagenesis techniques, e.g., to 
introduce improvements to the antigenicity of the protein or to test mutants in order to examine 

1 5 transcription, elongation or Tat binding activity at the molecular level. 

One may also prepare fusion proteins and peptides, e.g., where the P-TEFb or kinase 
protein coding regions are aligned within the same expression unit with other proteins or 
peptides having desired functions, such as for purification or immunodetection purposes (e.g., 
20 proteins that may be purified by affinity chromatography and enzyme label coding regions, 
respectively). 

Encompassed by the invention are DNA segments encoding relatively small peptides, 
such as, for example, peptides of from about 7 to about 50 amino acids in length, and more 
25 preferably, of from about 10 to about 30 amino acids in length; and also larger polypeptides up to 
and including proteins corresponding to the full-length sequences set forth in SEQ ID NO:2, 
SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 and SEQ ID NO:50. 

The DNA segments of the present invention may be employed for a variety of 
30 applications. For example, a particularly useful application concerns the recombinant production 
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of the individual subunits or proteins or peptides whose structure is derived from that of the 
subunits, or in the recombinant production of the holoenzyme following co-expression of the two 
subunits. Additionally, the P-TEFb -encoding DNA segments of the present invention can also 
be used in the preparation of nucleic acid probes or primers, which can, for example, be used in 
5 the identification and cloning of P-TEFb genes or related genomic sequences, or in the study of 
subunit(s) expression, and the like. 

B. Nucleic Acid Detection 

In addition to their use in directing the expression of the PTEF kinase or large subunit 
10 protein, the nucleic acid sequences disclosed herein also have a variety of other uses. For 
example, they also have utility as probes or primers in nucleic acid hybridization embodiments. 

1. Hybridization 

The use of a hybridization probe, e.g., of about 14-20, 25-30, 50, 75, 100, 120, 150, 200, 
15 250, 300 or 360 or so nucleotides in length allows the formation of a duplex molecule that is both 
stable and selective. Molecules having complementary sequences over stretches greater than 20 
bases in length are generally preferred, in order to increase stability and selectivity of the hybrid, 
and thereby improve the quality and degree of particular hybrid molecules obtained. In using 
sequences from any region of those disclosed herein, one may generally prefer to design nucleic 
20 acid molecules having stretches of 120 to 360 nucleotides, or even longer where desired. By 
choosing from more unique regions, as already disclosed herein, those of skill in the art will 
appreciate that smaller probess and primers may be designed and utilized, e.g., of between about 
14-20, 25, 30, 35 or so nucleotides in length. 

25 All fragments may be readily prepared by, for example, directly synthesizing the fragment 

by chemical means or by introducing selected sequences into recombinant vectors for recombinant 
production. These chemical means can include PCR™ technology of U.S. Patent 4,603,102 (herein 
incorporated by reference) or by introducing selected sequences into recombinant vectors for 
recombinant production. 
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Accordingly, the nucleotide sequences of the invention may be used for their ability to 
selectively form duplex molecules with complementary stretches of genes or RNAs or to provide 
primers for amplification of DNA or RNA from tissues. Depending on the application envisioned, 
5 one will desire to employ varying conditions of hybridization to achieve varying degrees of 
selectivity of probe towards target sequence. 

For certain applications, for example, substitution of nucleotides by site-directed 
mutagenesis, it is appreciated that lower stringency conditions are required. Under these 

10 conditions, hybridization may occur even though the sequences of probe and target strand are not 
perfectly complementary, but are mismatched at one or more positions. Conditions may be 
rendered less stringent by increasing salt concentration and decreasing temperature. For example, 
low stringency hybridization conditions for the present invention provide hybridization in 35% 
formamide, 5x Denhardts' solution, 5x SSC, 25 mM sodium phosphate, 0.1% SDS and 100 

1 5 jxg/ml denatured salmon sperm DNA or E. coli DNA at 42°C for 1 6 hours followed by sequential 
washes with 0.8x SSC, 0.1% SDS at 55°C and allows for cross-species hybridization to 
homologous proteins to occur. For example, under these conditions the Drosophila P-TEFb 
small subunit cDNA (SEQ ID NO:l) cross hybridizes to the human 42 kDa protein of SEQ ID 
NO:6. Thus, hybridization conditions can be readily manipulated depending on the desired results. 

20 Alternative hybridization conditions which are useful are given in Examples 3 and 4. Of course, 
the hybridization conditions chosen are dependent upon the objective that is to be achieved. 

In other embodiments, more stringent hybridization may be achieved under conditions of, 
for example, 50% formamide, 5x Denhardts' solution, 5x SSC, 25 mM sodium phosphate, 0.1% 
25 SDS and 100 ug/ml of denatured salmon sperm DNA at 42°C for 16 h followed by lhour 
sequential washes with O.lx SSC, 0.1% SDS solution at 60°C to remove the desired amount of 
background signal. 
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In certain embodiments, it will be advantageous to employ nucleic acid sequences of the 
present invention in combination with an appropriate means, such as a label, for determining 
hybridization. A wide variety of appropriate indicator means are known in the art, including 
fluorescent, radioactive, enzymatic or other ligands, such as avidin/biotin, which are capable of 
5 being detected. In preferred embodiments, one may desire to employ a fluorescent label or an 
enzyme tag such as urease, alkaline phosphatase or peroxidase, instead of radioactive or other 
environmentally undesirable reagents. In the case of enzyme tags, colorimetric indicator substrates 
are known that can be employed to provide a detection means visible to the human eye or 
spectrophotometrically, to identify specific hybridization with complementary nucleic acid- 
10 containing samples. 

In general, it is envisioned that the hybridization probes described herein will be useful both 
as reagents in solution hybridization, as in PCR™, for detection of expression of corresponding 
genes, as well as in embodiments employing a solid phase. In embodiments involving a solid 

1 5 phase, the test DNA (or RNA) is adsorbed or otherwise affixed to a selected matrix or surface. This 
fixed, single-stranded nucleic acid is then subjected to hybridization with selected probes under 
desired conditions. The selected conditions will depend on the particular circumstances based on 
the particular criteria required (depending, for example, on the G+C content, type of target nucleic 
acid, source of nucleic acid, size of hybridization probe, etc.). Following washing of the hybridized 

20 surface to remove non-specifically bound probe molecules, hybridization is detected, or even 
quantified, by means of the label. 

2. Amplification and PCR™ 

Nucleic acid used as a template for amplification is isolated from cells contained in the 
25 biological sample, according to standard methodologies (Sambrook et al, 1989). The nucleic 
acid may be genomic DNA or fractionated or whole cell RNA. Where RNA is used, it may be 
desired to convert the RNA to a cDNA. 
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Pairs of primers that selectively hybridize to nucleic acids corresponding to P-TEFb, 
kinase protein or a mutant thereof are contacted with the isolated nucleic acid under conditions 
that permit selective hybridization. The term "primer", as defined herein, is meant to encompass 
any nucleic acid that is capable of priming the synthesis of a nascent nucleic acid in a template- 
dependent process. Typically, primers are oligonucleotides from ten to twenty base pairs in 
length, but longer sequences can be employed. Primers may be provided in double-stranded or 
single-stranded form, although the single-stranded form is preferred. 

Once hybridized, the nucleic acid:primer complex is contacted with one or more enzymes 
that facilitate template-dependent nucleic acid synthesis. Multiple rounds of amplification, also 
referred to as "cycles," are conducted until a sufficient amount of amplification product is 
produced. 

Next, the amplification product is detected. In certain applications, the detection may be 
performed by visual means. Alternatively, the detection may involve indirect identification of 
the product via chemiluminescence, radioactive scintigraphy of incorporated radiolabel or 
fluorescent label or even via a system using electrical or thermal impulse signals (Affymax 
technology). 

A number of template dependent processes are available to amplify the marker sequences 
present in a given template sample. One of the best known amplification methods is the 
polymerase chain reaction (referred to as PCR™) which is described in detail in U.S. Patent Nos. 
4,683,195, 4,683,202 and 4,800,159, and each incorporated herein by reference in entirety. 

Briefly, in PCR™, two primer sequences are prepared that are complementary to regions 
on opposite complementary strands of the marker sequence. An excess of deoxynucleoside 
triphosphates are added to a reaction mixture along with a DNA polymerase, e.g., Taq 
polymerase. If the marker sequence is present in a sample, the primers will bind to the marker 
and the polymerase will cause the primers to be extended along the marker sequence by adding 
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on nucleotides. By raising and lowering the temperature of the reaction mixture, the extended 
primers will dissociate from the marker to form reaction products, excess primers will bind to the 
marker and to the reaction products and the process is repeated. 

5 A reverse transcriptase PCR™ (RT-PCR™) amplification procedure may be performed in 

order to quantify the amount of mRNA amplified or to prepare cDNA from the desired mRNA. 
Methods of reverse transcribing RNA into cDNA are well known and described in Sambrook et 
al, 1989. Alternative methods for reverse transcription utilize thermostable, RNA-dependent 
DNA polymerases. These methods are described in WO 90/07641, filed December 21, 1990, 
10 incorporated herein by reference. Polymerase chain reaction methodologies are well known in 
the art. 

Another method for amplification is the ligase chain reaction ("LCR"), disclosed in EPA 
No. 320 308, incorporated herein by reference in its entirety. In LCR, two complementary probe 

15 pairs are prepared, and in the presence of the target sequence, each pair will bind to opposite 
complementary strands of the target such that they abut. In the presence of a ligase, the two 
probe pairs will link to form a single unit. By temperature cycling, as in PCR™, bound ligated 
units dissociate from the target and then serve as "target sequences" for ligation of excess probe 
pairs. U.S. Patent 4,883,750 describes a method similar to LCR for binding probe pairs to a 

20 target sequence. 

Qbeta Replicase, described in PCT Application No. PCT/US87/00880, incorporated 
herein by reference, may also be used as still another amplification method in the present 
invention. In this method, a replicative sequence of RNA that has a region complementary to 
25 that of a target is added to a sample in the presence of an RNA polymerase. The polymerase will 
copy the replicative sequence that can then be detected. 

An isothermal amplification method, in which restriction endonucleases and ligases are 
used to achieve the amplification of target molecules that contain nucleotide 5'-[alpha-thio]- 
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triphosphates in one strand of a restriction site may also be useful in the amplification of nucleic 
acids in the present invention. 

Strand Displacement Amplification (SDA) is another method of carrying out isothermal 
5 amplification of nucleic acids which involves multiple rounds of strand displacement and 
synthesis, i.e., nick translation. A similar method, called Repair Chain Reaction (RCR), involves 
annealing several probes throughout a region targeted for amplification, followed by a repair 
reaction in which only two of the four bases are present. The other two bases can be added as 
biotinylated derivatives for easy detection. A similar approach is used in SDA. Target specific 
10 sequences can also be detected using a cyclic probe reaction (CPR). In CPR, a probe having 3' 
and 5' sequences of non-specific DNA and a middle sequence of specific RNA is hybridized to 
DNA that is present in a sample. Upon hybridization, the reaction is treated with RNase H, and 
the products of the probe identified as distinctive products that are released after digestion. The 
original template is annealed to another cycling probe and the reaction is repeated. 

15 

Still another amplification methods described in GB Application No. 2 202 328, and in 
PCT Application No. PCT/US89/01025, each of which is incorporated herein by reference in its 
entirety, may be used in accordance with the present invention. In the former application, 
"modified" primers are used in a PCR™-like, template- and enzyme-dependent synthesis. The 
20 primers may be modified by labeling with a capture moiety (e.g., biotin) and/or a detector moiety 
(e.g., enzyme). In the latter application, an excess of labeled probes are added to a sample. In 
the presence of the target sequence, the probe binds and is cleaved catalytically. After cleavage, 
the target sequence is released intact to be bound by excess probe. Cleavage of the labeled probe 
signals the presence of the target sequence. 

25 

Other nucleic acid amplification procedures include transcription-based amplification 
systems (TAS), including nucleic acid sequence based amplification (NASBA) and 3SR 
Gingeras et al 9 PCT Application WO 88/10315, incorporated herein by reference. In NASBA, 
the nucleic acids can be prepared for amplification by standard phenol/chloroform extraction, 
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heat denaturation of a clinical sample, treatment with lysis buffer and minispin columns for 
isolation of DNA and RNA or guanidinium chloride extraction of RNA. These amplification 
techniques involve annealing a primer which has target specific sequences. Following 
polymerization, DNA/RNA hybrids are digested with RNase H while double stranded DNA 
5 molecules are heat denatured again. In either case the single stranded DNA is made folly double 
stranded by addition of second target specific primer, followed by polymerization. The double- 
stranded DNA molecules are then multiply transcribed by an RNA polymerase such as T7 or 
SP6. In an isothermal cyclic reaction, the RNA's are reverse transcribed into single stranded 
DNA, which is then converted to double stranded DNA, and then transcribed once again with an 
10 RNA polymerase such as T7 or SP6. The resulting products, whether truncated or complete, 
indicate target specific sequences. 



Davey et al 9 EPA No. 329 822 (incorporated herein by reference in its entirety) disclose 
a nucleic acid amplification process involving cyclically synthesizing single-stranded RNA 

15 ("ssRNA"), ssDNA, and double-stranded DNA (dsDNA), which may be used in accordance with 
the present invention. The ssRNA is a template for a first primer oligonucleotide, which is 
elongated by reverse transcriptase (RNA-dependent DNA polymerase). The RNA is then 
removed from the resulting DNA:RNA duplex by the action of ribonuclease H (RNase H, an 
RNase specific for RNA in duplex with either DNA or RNA). The resultant ssDNA is a template 

20 for a second primer, which also includes the sequences of an RNA polymerase promoter 
(exemplified by T7 RNA polymerase) 5' to its homology to the template. This primer is then 
extended by DNA polymerase (exemplified by the large "Klenow" fragment of K coli DNA 
polymerase I), resulting in a double-stranded DNA ("dsDNA") molecule, having a sequence 
identical to that of the original RNA between the primers and having additionally, at one end, a 

25 promoter sequence. This promoter sequence can be used by the appropriate RNA polymerase to 
make many RNA copies of the DNA. These copies can then re-enter the cycle leading to very 
swift amplification. With proper choice of enzymes, this amplification can be done isothermally 
without addition of enzymes at each cycle. Because of the cyclical nature of this process, the 
starting sequence can be chosen to be in the form of either DNA or RNA. 
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Miller et al, PCT Application WO 89/06700 (incorporated herein by reference in its 
entirety) disclose a nucleic acid sequence amplification scheme based on the hybridization of a 
promoter/primer sequence to a target single-stranded DNA ("ssDNA") followed by transcription 
5 of many RNA copies of the sequence. This scheme is not cyclic, i.e., new templates are not 
produced from the resultant RNA transcripts. Other amplification methods include "RACE" and 
"one-sided PCR" (Frohman, 1990, incorporated by reference). 

Methods based on ligation of two (or more) oligonucleotides in the presence of nucleic 
10 acid having the sequence of the resulting "di-oligonucleotide", thereby amplifying the 
di-oligonucleotide, may also be used in the amplification step of the present invention. 

Following any amplification, it may be desirable to separate the amplification product 
from the template and the excess primer for the purpose of determining whether specific 
1 5 amplification has occurred. In one embodiment, amplification products are separated by agarose, 
agarose-acrylamide or polyacrylamide gel electrophoresis using standard methods. See 
Sambrookera/., 1989. 

Alternatively, chromatographic techniques may be employed to effect separation. There 
20 are many kinds of chromatography which may be used in the present invention: adsorption, 
partition, ion-exchange and molecular sieve, and many specialized techniques for using them 
including column, paper, thin-layer and gas chromatography. 

Amplification products must be visualized in order to confirm amplification of the 
25 marker sequences. One typical visualization method involves staining of a gel with ethidium 
bromide and visualization under UV light. Alternatively, if the amplification products are 
integrally labeled with radio- or fluorometrically-labeled nucleotides, the amplification products 
can then be exposed to x-ray film or visualized under the appropriate stimulating spectra, 
following separation. 
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In one embodiment, visualization is achieved indirectly. Following separation of 
amplification products, a labeled, nucleic acid probe is brought into contact with the amplified 
marker sequence. The probe preferably is conjugated to a chromophore but may be radiolabeled. 
In another embodiment, the probe is conjugated to a binding partner, such as an antibody or 
5 biotin, and the other member of the binding pair carries a detectable moiety. 

In one embodiment, detection is by Southern blotting and hybridization with a labeled 
probe. The techniques involved in Southern blotting are well known to those of skill in the art 
and can be found in many standard books on molecular protocols. See Sambrook et al, 1989. 
10 Briefly, amplification products are separated by gel electrophoresis. The gel is then contacted 
with a membrane, such as nitrocellulose, permitting transfer of the nucleic acid and non-covalent 
binding. Subsequently, the membrane is incubated with a chromophore-conjugated probe that is 
capable of hybridizing with a target amplification product. Detection is by exposure of the 
membrane to x-ray film or ion-emitting detection devices. 

15 

One example of the foregoing is described in U.S. Patent No. 5,279,721, incorporated by 
reference herein, which discloses an apparatus and method for the automated electrophoresis and 
transfer of nucleic acids. The apparatus permits electrophoresis and blotting without external 
manipulation of the gel and is ideally suited to carrying out methods according to the present 
20 invention. 

All the essential materials and reagents required for detecting P-TEFb or kinase protein 
markers in a biological sample may be assembled together in a kit. This generally will comprise 
preselected primers for specific markers. Also included may be enzymes suitable for amplifying 
25 nucleic acids including various polymerases (RT, Taq, etc.), deoxynucleotides and buffers to 
provide the necessary reaction mixture for amplification. 

Such kits generally will comprise, in suitable means, distinct containers for each 
individual reagent and enzyme as well as for each marker primer pair. Preferred pairs of primers 
30 for amplifying nucleic acids are selected to amplify the sequences specified in SEQ ID NO:l or 
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SEQ ID N0:3 or SEQ ID NO:43 or SEQ ID NO:46 or SEQ ID NO:48 such that, for example, 
nucleic acid fragments are prepared that include a contiguous stretch of nucleotides identical to 
for example about 14-20, 25, 30, 35, etc.; 48, 49, 50, 51, etc.; 75, 76, 77, 78, 79, 80 etc.; 100, 
101, 102, 103 etc.; 118, 119, 120, 121 etc.; 127, 128, 129, 130, 131, etc.; 316, 317, 318, 319, 
etc.; 322, 323, 324, 325, 326, etc.; 361, 362, 363, 364, etc.; 372, 373, 374, 375, etc. of SEQ ID 
NO:3, SEQ ID NO:48, SEQ ID NO:43 or SEQ ID NO:4, so long as the selected contiguous 
stretches are from spatially distinct regions. Similar fragments may be prepared which are 
identical or complimentary to SEQ ID NO:l such that the fragments do not hybridize to SEQ ID 
NO:5. 

In another embodiment, such kits will comprise hybridization probes specific for P-TEFb 
large or kinase proteins chosen from a group including nucleic acids corresponding to the 
sequences specified in SEQ ID NO:l or SEQ ID NO:3 or SEQ ID NO:43 or SEQ ID NO:46 or 
SEQ ID NO:48 or to intermediate lengths of the sequences specified. Such kits generally will 
comprise, in suitable means, distinct containers for each individual reagent and enzyme as well 
as for each marker hybridization probe. 

3. Other Assays 

Other methods for genetic screening to accurately detect genetic changes which may be 
caused by disease, such as cancers, viral or parasitic infections that alter normal cellular 
production and processing, in genomic DNA, cDNA or RNA samples may be employed, 
depending on the specific situation. 

For example, one method of screening for genetic variation is based on RNase cleavage 
of base pair mismatches in RNA/DNA and RNA/RNA heteroduplexes. As used herein, the term 
"mismatch" is defined as a region of one or more unpaired or mispaired nucleotides in a double- 
stranded RNA/RNA, RNA/DNA or DNA/DNA molecule. This definition thus includes 
mismatches due to insertion/deletion mutations, as well as single and multiple base point 
mutations. 
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U.S. Patent No. 4,946,773 describes an RNase A mismatch cleavage assay that involves 
annealing single-stranded DNA or RNA test samples to an RNA probe, and subsequent treatment 
of the nucleic acid duplexes with RNase A. After the RNase cleavage reaction, the RNase is 
inactivated by proteolytic digestion and organic extraction, and the cleavage products are 
denatured by heating and analyzed by electrophoresis on denaturing polyacrylamide gels. For 
the detection of mismatches, the single-stranded products of the RNase A treatment, 
electrophoretically separated according to size, are compared to similarly treated control 
duplexes. Samples containing smaller fragments (cleavage products) not seen in the control 
duplex are scored as +. 

Currently available RNase mismatch cleavage assays, including those performed 
according to U.S. Patent No. 4,946,773, require the use of radiolabeled RNA probes. Myers and 
Maniatis in U.S. Patent No. 4,946,773 describe the detection of base pair mismatches using 
RNase A. Other investigators have described the use of K coli enzyme, RNase I, in mismatch 
assays. Because it has broader cleavage specificity than RNase A, RNase I would be a desirable 
enzyme to employ in the detection of base pair mismatches if components can be found to 
decrease the extent of non-specific cleavage and increase the frequency of cleavage of 
mismatches. The use of RNase I for mismatch detection is described in literature from Promega 
Biotech. Promega markets a kit containing RNase I that is shown in their literature to cleave 
three out of four known mismatches, provided the enzyme level is sufficiently high. 

The RNase protection assay was first used to detect and map the ends of specific mRNA 
targets in solution. The assay relies on being able to easily generate high specific activity 
radiolabeled RNA probes complementary to the mRNA of interest by in vitro transcription. 
Originally, the templates for in vitro transcription were recombinant plasmids containing 
bacteriophage promoters. The probes are mixed with total cellular RNA samples to permit 
hybridization to their complementary targets, then the mixture is treated with RNase to degrade 
excess unhybridized probe. Also, as originally intended, the RNase used is specific for single- 
stranded RNA, so that hybridized double-stranded probe is protected from degradation. After 
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inactivation and removal of the RNase, the protected probe (which is proportional in amount to 
the amount of target mRNA that was present) is recovered and analyzed on a polyacrylamide gel. 

The RNase Protection assay was adapted for detection of single base mutations. In this 
5 type of RNase A mismatch cleavage assay, radiolabeled RNA probes transcribed in vitro from 
wild type sequences, are hybridized to complementary target regions derived from test samples. 
The test target generally comprises DNA (either genomic DNA or DNA amplified by cloning in 
plasmids or by PCR™), although RNA targets (endogenous mRNA) have occasionally been 
used. If single nucleotide (or greater) sequence differences occur between the hybridized probe 

10 and target, the resulting disruption in Watson-Crick hydrogen bonding at that position 
("mismatch") can be recognized and cleaved in some cases by single-strand specific 
ribonuclease. To date, RNase A has been used almost exclusively for cleavage of single-base 
mismatches, although RNase I has recently been shown as useful also for mismatch cleavage. 
There are recent descriptions of using the MutS protein and other DNA-repair enzymes for 

1 5 detection of single-base mismatches. 

C. Mutagenesis 

Site-specific mutagenesis is a technique useful in the preparation of individual peptides, 
or biologically functional equivalent proteins or peptides, through specific mutagenesis of the 

20 underlying DNA. The technique further provides a ready ability to prepare and test sequence 
variants, incorporating one or more of the foregoing considerations, by introducing one or more 
nucleotide sequence changes into the DNA. Site-specific mutagenesis allows the production of 
mutants through the use of specific oligonucleotide sequences which encode the DNA sequence 
of the desired mutation, as well as a sufficient number of adjacent nucleotides, to provide a 

25 primer sequence of sufficient size and sequence complexity to form a stable duplex on both sides 
of the deletion junction being traversed. Typically, a primer of about 17 to 25 nucleotides in 
length is preferred, with about 5 to 10 residues on both sides of the junction of the sequence 
being altered. 
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In general, the technique of site-specific mutagenesis is well known in the art. As will be 
appreciated, the technique typically employs a bacteriophage vector that exists in both a single 
stranded and double stranded form. Typical vectors useful in site-directed mutagenesis include 
vectors such as the Ml 3 phage. These phage vectors are commercially available and their use is 
5 generally well known to those skilled in the art. Double stranded plasmids are also routinely 
employed in site directed mutagenesis, which eliminates the step of transferring the gene of 
interest from a phage to a plasmid. 

In general, site-directed mutagenesis is performed by first obtaining a single-stranded 
10 vector, or melting of two strands of a double stranded vector which includes within its sequence 
a DNA sequence encoding the desired protein. An oligonucleotide primer bearing the desired 
mutated sequence is synthetically prepared. This primer is then annealed with the single-stranded 
DNA preparation, and subjected to DNA polymerizing enzymes such as E. colt polymerase I 
Klenow fragment, in order to complete the synthesis of the mutation-bearing strand. Thus, a 
15 heteroduplex is formed wherein one strand encodes the original non-mutated sequence and the 
second strand bears the desired mutation. This heteroduplex vector is then used to transform 
appropriate cells, such as E. coli cells, and clones are selected that include recombinant vectors 
bearing the mutated sequence arrangement. 

20 The preparation of sequence variants of the selected gene using site-directed mutagenesis 

is provided as a means of producing potentially useful species and is not meant to be limiting, as 
there are other ways in which sequence variants of genes may be obtained. For example, 
recombinant vectors encoding the desired gene may be treated with mutagenic agents, such as 
hydroxylamine, to obtain sequence variants. 

25 

D. Recombinant Vectors, Host Cells and Expression 

Recombinant vectors form important further aspects of the present invention. The term 
"expression vector or construct" means any type of genetic construct containing a nucleic acid 
coding for a gene product in which part or all of the nucleic acid encoding sequence is capable of 
30 being transcribed. The transcript may be translated into a protein, but it need not be. Thus, in 
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certain embodiments, expression includes both transcription of a gene and translation of a RNA 
into a gene product. In other embodiments, expression only includes transcription of the nucleic 
acid, for example, to generate antisense constructs. 

Particularly useful vectors are contemplated to be those vectors in which the coding 
portion of the DNA segment, whether encoding a full length protein or smaller peptide, is 
positioned under the transcriptional control of a promoter. A "promoter" refers to a DNA 
sequence recognized by the synthetic machinery of the cell, or introduced synthetic machinery, 
required to initiate the specific transcription of a gene. The phrases "operatively positioned", 
"under control" or "under transcriptional control" means that the promoter is in the correct 
location and orientation in relation to the nucleic acid to control RNA polymerase initiation and 
expression of the gene. 

The promoter may be in the form of the promoter that is naturally associated with a 
P-TEFb or kinase protein gene, as may be obtained by isolating the 5' non-coding sequences 
located upstream of the coding segment or exon, for example, using recombinant cloning and/or 
PCR™ technology, in connection with the compositions disclosed herein (PCR™ technology is 
disclosed in U.S. Patent 4,683,202 and U.S. Patent 4,682,195, each incorporated herein by 
reference). 

In other embodiments, it is contemplated that certain advantages will be gained by 
positioning the coding DNA segment under the control of a recombinant, or heterologous, 
promoter. As used herein, a recombinant or heterologous promoter is intended to refer to a 
promoter that is not normally associated with a P-TEFb or a kinase protein gene in its natural 
environment. Such promoters may include promoters normally associated with other genes, 
and/or promoters isolated from any other bacterial, viral, eukaryotic, or mammalian cell. 

Naturally, it will be important to employ a promoter that effectively directs the 
expression of the DNA segment in the cell type, organism, or even animal, chosen for 
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expression. The use of promoter and cell type combinations for protein expression is generally 
known to those of skill in the art of molecular biology, for example, see Sambrook et al (1989), 
incorporated herein by reference. The promoters employed may be constitutive, or inducible, 
and can be used under the appropriate conditions to direct high level expression of the introduced 
DNA segment, such as is advantageous in the large-scale production of recombinant proteins or 
peptides. 

At least one module in a promoter functions to position the start site for RNA synthesis. 
The best known example of this is the TATA box, but in some promoters lacking a TATA box, 
such as the promoter for the mammalian terminal deoxynucleotidyl transferase gene and the 
promoter for the SV40 late genes, a discrete element overlying the start site itself helps to fix the 
place of initiation. 

Additional promoter elements regulate the frequency of transcriptional initiation. 
Typically, these are located in the region 30-1 10 bp upstream of the start site, although a number 
of promoters have been shown to contain functional elements downstream of the start site as 
well. The spacing between promoter elements frequently is flexible, so that promoter function is 
preserved when elements are inverted or moved relative to one another. In the thymidine kinase 
(tk) promoter, the spacing between promoter elements can be increased to 50 bp apart before 
activity begins to decline. Depending on the promoter, it appears that individual elements can 
function either co-operatively or independently to activate transcription. 

The particular promoter that is employed to control the expression of a nucleic acid is not 
believed to be critical, so long as it is capable of expressing the nucleic acid in the targeted cell. 
Thus, where a human cell is targeted, it is preferable to position the nucleic acid coding region 
adjacent to and under the control of a promoter that is capable of being expressed in a human 
cell. Generally speaking, such a promoter might include either a human or viral promoter. 
Preferred promoters include those derived from HSV, including the HNFlcc promoter. Another 
preferred embodiment is the tetracycline controlled promoter. 
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In various other embodiments, the human cytomegalovirus (CMV) immediate early gene 
promoter, the Simian virus 40 (SV40) early promoter and the Rous sarcoma virus long terminal 
repeat can be used to obtain high-level expression of transgenes. The use of other viral or 
5 mammalian cellular or bacterial phage promoters which are well-known in the art to achieve 
expression of a transgene is contemplated as well, provided that the levels of expression are 
sufficient for a given purpose. The following tables list several elements/promoters which may 
be employed, in the context of the present invention, to regulate the expression of P-TEFb or a 
kinase protein gene. This list is not intended to be exhaustive of all the possible elements 
1 0 involved in the promotion of transgene expression but, merely, to be exemplary thereof. 

Enhancers were originally detected as genetic elements that increased transcription from a 
g I promoter located at a distant position on the same molecule of DNA. This ability to act over a 
^ large distance had little precedent in classic studies of prokaryotic transcriptional regulation. 
H=15 Subsequent work showed that regions of DNA with enhancer activity are organized much like 
E promoters. That is, they are composed of many individual elements, each of which binds to one 
J* or more transcriptional proteins. 

H The basic distinction between enhancers and promoters is operational. An enhancer 

^20 region as a whole must be able to stimulate transcription at a distance; this need not be true of a 
■ promoter region or its component elements. On the other hand, a promoter must have one or 

more elements that direct initiation of RNA synthesis at a particular site and in a particular 
orientation, whereas enhancers lack these specificities. Promoters and enhancers are often 
overlapping and contiguous, often seeming to have a very similar modular organization. 

25 

Additionally any promoter/enhancer combination (as per the Eukaryotic Promoter Data 
Base EPDB) could also be used to drive expression of a transgene. Use of a T3, T7 or SP6 
cytoplasmic expression system is another possible embodiment. Eukaryotic cells can support 
cytoplasmic transcription from certain bacterial promoters if the appropriate bacterial polymerase 
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is provided, either as part of the delivery complex or as an additional genetic expression 
construct. 

PROMOTER TABLE 



PROMOTER 


REFERENCES 


Immunoglobulin Heavy Chain 


Hanerji et aL, 1983; Gilles et aL, 1983; Grosschedl and Baltimore, 
1985; Atchinson and Perry, 1986, 1987; Imler et al, 1987; 
Weinberger et aL, 1988; Kiledjian et aL, 1988; Porton et aL, 1990 


Immunoglobulin Light Chain 


Queen and Baltimore, 1983; Picard and Schaffher, 1984 


T-Cell Receptor 


Luria et aL, 1987, Winoto and Baltimore, 1989; Redondo et aL, 1990 


HLA DQ a and DQ B 


Sullivan and Peterlin, 1987 


B-Interferon 


Goodbourn et al 1986* Flliita Pt nl 1Q87* frnnrlhmirn ?mrl A/Tsmintic 

1985 


Interleukin-2 


Greene t?/ aL , 1989 


Interleukin-2 Receptor 


Greene et aL, 1989; Lin et aL, 1990 


MHC Class II 5 


Koch et aL, 1989 


MHC Class II HLA-DRct 


Sherman^ al, 1989 


B-Actin 


Kawamoto^ aL, 1988; Ng et aL, 1989 


Muscle Creatine Kinase 


Jaynes etaL, 1988; Horlick and Benfield, 1989; Johnson etal, 
1989a 


Prealbumin (Transthyretin) 


Costa etaL, 1988 


Elastase / 


Omitz etaL, 1987 


Metallothionein 


Karin et aL, 1987; Culotta and Hamer, 1989 


Collagenase 


Pinkert et aL, 1987; Angel et aL, 1987 


Albumin Gene 


VwkextetaL, 1987, Tronche etal, 1989, 1990 


a-Fetoprotein 


Godbout etaL, 1988; Campere and Tilghman, 1989 


T-Globin 


Bodine and Ley, 1987; Perez-Stable and Constantini, 1990 


B-GIobin 


Trudel and Constantini, 1987 


e-fos 


Cohen etaL, 1987 


c-HA-ras 


Triesman, 1986; Deschamps et aL, 1985 


Insulin 


Edlundefa/., 1985 


Neural Cell Adhesion Molecule 
(NCAM) 


Hirsche/a/. ? 1990 
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PROMOTER 


JVC*! 1 JLLirvrLil 1 V- EL/kj 


^1-Antitrypain 


Latimer et al , 1990 


H2B (TH2B) Histone 


Hwang et al , 1990 


Mouse or Type I Collagen 


Ripe etal, 1989 


Glucose-Regulated Proteins (GRP94 and 

VJ 1VL / oj 


Change/ al , 1989 


T?at Orowtli T-Tormonp 

AVCIL VJ1 WW 11 1 X X\Jl 111 Wilt/ 


i^dibcll ci til, , I70D 


T-Tumfin Rpnim AmvlniH A f^AA^ 

J. ±u.iiiciii vJwi uiii y iv/ivi I*. i orvrV J 


CUUlUUKC ei CM., l707 


Troponin I (TN I) 


Yutzey etal, 1989 


Platelet-Derived Growth Factor 


Pech etal, 1989 


Duchenne Muscular Dystrophy 


Klamutefa/., 1990 


SV40 


Banerji et al, 1981; Moreau et al, 1981; Sleigh and Lockett, 1985; 
Firak and Subramanian, 1986; Herr and Clarke, 1986; Imbra and 
Karin, 1986; Kadesch and Berg, 1986; Wang and Calame, 1986; 
Ondek et al, 1987; Kuhl et al, 1987 Schaffher et al, 1988 


Polyoma 


Swartzendruber and Lehman, 1975; Vasseur et al, 1980; Katinka et 
al, 1980, 1981; Tyndell et al, 1981; Dandolo etal, 1983; deVilliers 
etal, 1984; Hen etal, 1986; Satake etal, 1988; Campbell and 
Villarreal, 1988 


Retroviruses 


Kriegler and Botchan, 1982, 1983; Levinson et al, 1982; Kriegler et 
al, 1983, 1984a,b, 1988; Bosze etal, 1986; Miksiceke/a/., 1986; 
Celander and Haseltine, 1987; Thiesen et al, 1988; Celander et al, 
1988; Choi etal, 1988; Reisman and Rotter, 1989 


Papilloma Virus 


Campo et al, 1983; Lusky et al, 1983; Spandidos and Wilkie, 1983; 
Spalholz etal, 1985; Lusky and Botchan, 1986; Cripe et al, 1987; 
Gloss etal, 1987; Hirochika etal, 1987, Stephens and Hentschel, 
1987; Glue etal, 1988 


Hepatitis B Virus 


Bulla and Siddiqui, 1986; Jameel and Siddiqui, 1986; Shaul and Ben- 
Levy, 1987; Spandau and Lee, 1988; Vannice and Levinson, 1988 


Human Immunodeficiency Virus 


Muesing et al, 1987; Hauber and Cullan, 1988; Jakobovits et al, 
1988; Feng and Holland, 1988; Takebe et al, 1988; Rowen et al, 
1988; Berkhout et al, 1989; Laspia et al, 1989; Sharp and 
Marciniak, 1989; Braddock et al, 1989 


Cytomegalovirus 


Weber et al, 1984; Boshart et al, 1985; Foecking and Hofstetter, 
1986 


Gibbon Ape Leukemia Virus 


Holbrook et al ,1987; Quinn et al , 1 989 
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ENHANCER TABLE 





Inducer 


References 


MT II 


Phorbol Ester (TFA) 
Heavy metals 


Palmiter et aL, 1982; Haslinger and Karin, 
1985; Searle etaL, 1985; Stuarts aL, 
1985; Imagawa et aL, 1987; Karin <D, 
1987; Angel etaL, 1987b; McNeall etaL, 
1989 


MMTV (mouse mammary 
tumor virus) 


Glucocorticoids 


x lUOllg c-i Lit. , 1701) LitC et til. , I701, 

Majors and Varmus, 1983; Chandler et aL, 
1983; Lee et aL, 1984; Fonta et aL, 1985; 
SdkmetaL, 1986 


B-Interferon 


poly(rI)X 
poly(rc) 


Tavernier et aL, 1983 


Adenovirus 5 E2 


Ela 


Imperiale and Nevins, 1984 


Collagenase 


Phorbol Ester (TPA) 


Angle etaL, 1987a 


Stromelysin 


Phorbol Ester (TPA) 


Angle etaL, 1987b 


SV40 


Phorbol Ester (TFA) 


Angel etaL, 1987b 


Murine MX Gene 


Interferon, Newcastle Disease 
Virus 




GRP78 Gene 


A23187 


Resendeze/^/., 1988 


cc-2-Macroglobulin 


IL-6 


Kunz etaL, 1989 


Vimentin 


Serum 


RMingetaL, 1989 


MHC Class I Gene H-2kb 


Interferon 


Blanare/a/., 1989 


HSP70 




Tn\/lrvT* £>f stl 1 OC 0 ■ Tax rl/'w on/-l l^mnctAn 

i ay ior c«. , i yoy, i ayior ana Js.ingsion, 
1990a,b 


Proliferin 


Phorbol Ester-TPA 


Mordacq and Linzer, 1989 


Tumor Necrosis Factor 


FMA 


Hensele/a/., 1989 


Thyroid Stimulating Hormone 
a Gene 


Thyroid Hormone 


Chatterjee^a/., 1989 
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Turning to the expression of the P-TEFb or kinase proteins of the present invention, once 
a suitable clone or clones have been obtained, whether they be cDNA based or genomic, one may 
proceed to prepare an expression system. The engineering of DNA segment(s) for expression in 
a prokaryotic or eukaryotic system may be performed by techniques generally known to those of 
skill in recombinant expression. It is believed that virtually any expression system may be 
employed in the expression of the proteins of the present invention. 

Two in vitro systems that best represent the features of the in vivo situation are nuclear 
extracts from HeLa and Drosophila K c cells (K C N). In HeLa nuclear extracts transcription 
initiating at the HIV LTR promoter gives rise to an abundance of transcripts terminated close to 
the promoter and only very limited amounts of longer transcripts. Addition of HIV Tat protein 
increases the number of DRB-sensitive elongation complexes that are able to make long 
transcripts (Marciniak and Sharp, 1991). Using Drosophila K^N extracts only a fraction of RNA 
polymerase II molecules that initiate generate transcripts longer than several hundred nucleotides 
(Kephart et al 9 1992). The addition of DRB to KcN extracts selectively inhibits the production 
of elongation complexes capable of sustained elongation. Two factors P-TEF (positive 
transcription elongation factor) and N-TEF (negative transcription elongation factor) were 
proposed to control this behavior (Marshall and Price, 1992). 

The inventor has characterized processive elongation complexes in HeLa nuclear extracts 
that are either DRB-sensitive or -insensitive. The DRB-sensitive processive elongation 
complexes produce runoff transcripts at a faster rate than DRB-insensitive processive elongation 
complexes. The immobilized DNA template studies suggest the existence of a HeLa P-TEFb- 
like activity. The number of DRB-sensitive processive elongation complexes appears to be 
determined by the amount of P-TEFb activity. The inventor shows that Drosophila and HeLa 
P-TEFb activities are DRB-sensitive and can act cross-species in the transition into productive 
elongation. 

Both cDNA and genomic sequences are suitable for eukaryotic expression, as the host 
cell will generally process the genomic transcripts to yield functional mRNA for translation into 
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protein. Generally speaking, it may be more convenient to employ as the recombinant gene a 
cDNA version of the gene. It is believed that the use of a cDNA version will provide advantages 
in that the size of the gene will generally be much smaller and more readily employed to 
transfect the targeted cell than will a genomic gene, which will typically be up to an order of 
5 magnitude larger than the cDNA gene. However, the inventor does not exclude the possibility of 
employing a genomic version of a particular gene where desired. 



In expression, one will typically include a polyadenylation signal to effect proper 
polyadenylation of the transcript. The nature of the polyadenylation signal is not believed to be 
10 crucial to the successful practice of the invention, and any such sequence may be employed. 
Preferred embodiments include the S V40 polyadenylation signal and the bovine growth hormone 
polyadenylation signal, convenient and known to function well in various target cells. Also 
contemplated as an element of the expression cassette is a terminator. These elements can serve 
to enhance message levels and to minimize read through from the cassette into other sequences. 

15 

A specific initiation signal also may be required for efficient translation of coding 
sequences. These signals include the ATG initiation codon and adjacent sequences. Exogenous 
translational control signals, including the ATG initiation codon, may need to be provided. One 
of ordinary skill in the art would readily be capable of determining this and providing the 
20 necessary signals. It is well known that the initiation codon must be "in-frame" with the reading 
frame of the desired coding sequence to ensure translation of the entire insert. The exogenous 
translational control signals and initiation codons can be either natural or synthetic. The 
efficiency of expression may be enhanced by the inclusion of appropriate transcription enhancer 
elements. 

25 

It is proposed that the small subunit of P-TEFb may be co-expressed with the large 
subunit of P-TEFb, wherein the proteins may be co-expressed in the same cell or wherein one 
subunit, for example the small subunit, may be provided to a cell that already has the other 
subunit, for example the large subunit, of P-TEFb. Co-expression may be achieved by 
30 co-transfecting the cell with two distinct recombinant vectors, each bearing a copy of either the 
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respective DNA. Alternatively, a single recombinant vector may be constructed to include the 
coding regions for both of the subunits, which could then be expressed in cells transfected with 
the single vector. In either event, the term M co-expression M herein refers to the expression of both 
of the subunits of P-TEFb in the same recombinant cell. 

5 

As used herein, the terms "engineered" and "recombinant" cells are intended to refer to a 
cell into which an exogenous DNA segment or gene, such as a cDNA or gene encoding a 
P-TEFb or kinase protein has been introduced. Therefore, engineered cells are distinguishable 
from naturally occurring cells which do not contain a recombinantly introduced exogenous DNA 
10 segment or gene. Engineered cells are thus cells having a gene or genes introduced through the 
hand of man. Recombinant cells include those having an introduced cDNA or genomic gene, 
and also include genes positioned adjacent to a promoter not naturally associated with the 
particular introduced gene. 

15 To express a recombinant P-TEFb or kinase protein, whether mutant or wild-type, in 

accordance with the present invention one would prepare an expression vector that comprises a 
P-TEFb- or kinase protein-encoding nucleic acid under the control of one or more promoters. To 
bring a coding sequence "under the control of" a promoter, one positions the 5' end of the 
transcription initiation site of the transcriptional reading frame generally between about 1 and 

20 about 50 nucleotides "downstream" of (i.e., 3' of) the chosen promoter. The "upstream" 
promoter stimulates transcription of the DNA and promotes expression of the encoded 
recombinant protein. This is the meaning of "recombinant expression" in this context. 

Many standard techniques are available to construct expression vectors containing the 
25 appropriate nucleic acids and transcriptional/translational control sequences in order to achieve 
protein or peptide expression in a variety of host-expression systems. Cell types available for 
expression include, but are not limited to, bacteria, such as E. coli and B. subtilis transformed 
with recombinant bacteriophage DNA, plasmid DNA or cosmid DNA expression vectors. 
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Certain examples of prokaryotic hosts are E. coli strain RR1, E. coli LE392, E. coliB, 
E. coli X 1776 (ATCC No. 31537) as well as E. coli W3110 (F-, lambda-, prototrophic, ATCC 
No. 273325); bacilli such as Bacillus subtilis; and other enterobacteriaceae such as Salmonella 
typhimurium, Serratia marcescens, and various Pseudomonas species. 

In general, plasmid vectors containing replicon and control sequences which are derived 
from species compatible with the host cell are used in connection with these hosts. The vector 
ordinarily carries a replication site, as well as marking sequences which are capable of providing 
phenotypic selection in transformed cells. For example, E. coli is often transformed using 
pBR322, a plasmid derived from an E. coli species. pBR322 contains genes for ampicillin and 
tetracycline resistance and thus provides easy means for identifying transformed cells. The pBR 
plasmid, or other microbial plasmid or phage must also contain, or be modified to contain, 
promoters which can be used by the microbial organism for expression of its own proteins. 

In addition, phage vectors containing replicon and control sequences that are compatible 
with the host microorganism can be used as transforming vectors in connection with these hosts. 
For example, the phage lambda GEM™- 11 may be utilized in making a recombinant phage 
vector which can be used to transform host cells, such as E. coli LE392. 

Further useful vectors include pIN vectors and pGEX vectors, for use in generating 
glutathione S-transferase (GST) soluble fusion proteins for later purification and separation or 
cleavage. Other suitable fusion proteins are those with B-galactosidase, ubiquitin, the like. 

Promoters that are most commonly used in recombinant DNA construction include the b- 
lactamase (penicillinase), lactose and tryptophan (trp) promoter systems. While these are the 
most commonly used, other microbial promoters have been discovered and utilized, and details 
concerning their nucleotide sequences have been published, enabling those of skill in the art to 
ligate them functionally with plasmid vectors. 
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The following details concerning recombinant protein production in bacterial cells, such 
as E. coli, are provided by way of exemplary information on recombinant protein production in 
general, the adaptation of which to a particular recombinant expression system will be known to 
those of skill in the art. 

5 

Bacterial cells, for example, E. coli, containing the expression vector are grown in any of 
a number of suitable media, for example, LB. The expression of the recombinant protein may be 
induced, e.g., by adding IPTG to the media or by switching incubation to a higher temperature. 
After culturing the bacteria for a further period, generally of between 2 and 24 h, the cells are 
1 0 collected by centrifugation and washed to remove residual media. 

The bacterial cells are then lysed, for example, by disruption in a cell homogenizer and 
% centrifuged to separate the dense inclusion bodies and cell membranes from the soluble cell 
j± components. This centrifugation can be performed under conditions whereby the dense inclusion 
H 15 bodies are selectively enriched by incorporation of sugars, such as sucrose, into the buffer and 
m centrifugation at a selective speed. 

jjl If the recombinant protein is expressed in the inclusion bodies, as is the case in many 

p : instances, these can be washed in any of several solutions to remove some of the contaminating 
^20 host proteins, then solubilized in solutions containing high concentrations of urea (e.g. 8M) or 

chaotropic agents such as guanidine hydrochloride in the presence of reducing agents, such as B- 

mercaptoethanol or DTT (dithiothreitol). 

Under some circumstances, it may be advantageous to incubate the protein for several h 
25 under conditions suitable for the protein to undergo a refolding process into a conformation 
which more closely resembles that of the native protein. Such conditions generally include low 
protein concentrations, less than 500 mg/ml, low levels of reducing agent, concentrations of urea 
less than 2 M and often the presence of reagents such as a mixture of reduced and oxidized 
glutathione which facilitate the interchange of disulfide bonds within the protein molecule. 

30 
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The refolding process can be monitored, for example, by SDS-PAGE, or with antibodies 
specific for the native molecule (which can be obtained from animals vaccinated with the native 
molecule or smaller quantities of recombinant protein). Following refolding, the protein can then 
be purified further and separated from the refolding mixture by chromatography on any of 
5 several supports including ion exchange resins, gel permeation resins or on a variety of affinity 
columns. 

For expression in Saccharomyces, the plasmid YRp7, for example, is commonly used. 
This plasmid already contains the trp\ gene which provides a selection marker for a mutant strain 
1 0 of yeast lacking the ability to grow in tryptophan, for example ATCC No. 44076 or PEP4- 1 . The 
presence of the trp\ lesion as a characteristic of the yeast host cell genome then provides an 
effective environment for detecting transformation by growth in the absence of tryptophan. 

% Suitable promoting sequences in yeast vectors include the promoters for 

HI 5 3 -phosphogly cerate kinase or other glycolytic enzymes, such as enolase, glyceraldehyde-3- 
Cy phosphate dehydrogenase, hexokinase, pyruvate decarboxylase, phosphofructokinase, glueose-6- 
J" phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, 
!Z phosphoglucose isomerase, and glucokinase. In constructing suitable expression plasmids, the 
termination sequences associated with these genes are also ligated into the expression vector 3' 
j=j20 of the sequence desired to be expressed to provide polyadenylation of the mRNA and 
~~ 4 termination. 

Other suitable promoters, which have the additional advantage of transcription controlled 
by growth conditions, include the promoter region for alcohol dehydrogenase 2, 
25 isocytochrome C, acid phosphatase, degradative enzymes associated with nitrogen metabolism, 
and the aforementioned glyceraldehyde-3-phosphate dehydrogenase, and enzymes responsible 
for maltose and galactose utilization. 
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In addition to microorganisms, cultures of cells derived from multicellular organisms 
may also be used as hosts. In principle, any such cell culture is workable, whether from 
vertebrate or invertebrate culture. In addition to mammalian cells, these include insect cell 
systems infected with recombinant virus expression vectors {e.g., baculovirus); and plant cell 
5 systems infected with recombinant virus expression vectors {e.g., cauliflower mosaic virus, 
CaMV; tobacco mosaic virus, TMV) or transformed with recombinant plasmid expression 
vectors {e.g., Ti plasmid) containing one or more P-TEFb or kinase protein coding sequences. 

In a useful insect system, Autograph californica nuclear polyhidrosis virus (AcNPV) is 
10 used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. The 
P-TEFb or kinase protein coding sequences are cloned into non-essential regions (for example 
the polyhedrin gene) of the virus and placed under control of an AcNPV promoter (for example 
the polyhedrin promoter). Successful insertion of the coding sequences results in the inactivation 
of the polyhedrin gene and production of non-occluded recombinant virus {i.e., virus lacking the 
15 proteinaceous coat coded for by the polyhedrin gene). These recombinant viruses are then used 
to infect Spodoptera frugiperda cells in which the inserted gene is expressed {e.g., U.S. Patent 
No. 4,215,051, Smith, incorporated herein by reference). 

Examples of useful mammalian host cell lines are VERO and HeLa cells, Chinese 
20 hamster ovary (CHO) cell lines, W138, BHK, COS-7, 293, HepG2, 3T3, RIN and MDCK cell 
lines. In addition, a host cell strain may be chosen that modulates the expression of the inserted 
sequences, or modifies and processes the gene product in the specific fashion desired. Such 
modifications {e.g., glycosylation) and processing (e.g., cleavage) of protein products may be 
important for the function of the protein. 

25 

Different host cells have characteristic and specific mechanisms for the post-translational 
processing and modification of proteins. Appropriate cells lines or host systems can be chosen to 
ensure the correct modification and processing of the foreign protein expressed. To this end, 
eukaryotic host cells such as 293 cells have already been shown to produce active P-TEFb. 

30 
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Expression vectors for use in mammalian such cells ordinarily include an origin of 
replication (as necessary), a promoter located in front of the gene to be expressed, along with any 
necessary ribosome binding sites, RNA splice sites, polyadenylation site, and transcriptional 
terminator sequences. The origin of replication may be provided either by construction of the 
5 vector to include an exogenous origin, such as may be derived from SV40 or other viral {e.g., 
Polyoma, Adeno, VSV, BPV) source, or may be provided by the host cell chromosomal 
replication mechanism. If the vector is integrated into the host cell chromosome, the latter is 
often sufficient. 

10 The promoters may be derived from the genome of mammalian cells (e.g., 

metallothionein promoter) or from mammalian viruses (e.g., the adenovirus late promoter; the 
vaccinia virus 7.5K promoter). Further, it is also possible, and may be desirable, to utilize 
promoter or control sequences normally associated with the desired P-TEFb or kinase protein 
gene sequence, provided such control sequences are compatible with the host cell systems. 

15 

A number of viral based expression systems may be utilized, for example, commonly 
used promoters are derived from polyoma, Adenovirus 2, and most frequently Simian Virus 40 
(SV40). The early and late promoters of SV40 virus are particularly useful because both are 
obtained easily from the virus as a fragment which also contains the SV40 viral origin of 
20 replication. Smaller or larger SV40 fragments may also be used, provided there is included the 
approximately 250 bp sequence extending from the Hind III site toward the Bgl I site located in 
the viral origin of replication. 

In cases where an adenovirus is used as an expression vector, the coding sequences may 
25 be ligated to an adenovirus transcription/ translation control complex, e.g., the late promoter and 
tripartite leader sequence. This chimeric gene may then be inserted in the adenovirus genome by 
in vitro or in vivo recombination. Insertion in a non-essential region of the viral genome (e.g., 
region El or E3) will result in a recombinant virus that is viable and capable of expressing 
P-TEFb or kinase proteins in infected hosts. 

30 
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Specific initiation signals may also be required for efficient translation of P-TEFb or 
kinase protein coding sequences. These signals include the ATG initiation codon and adjacent 
sequences. Exogenous translational control signals, including the ATG initiation codon, may 
additionally need to be provided. One of ordinary skill in the art would readily be capable of 
5 determining this and providing the necessary signals. It is well known that the initiation codon 
must be in-frame (or in-phase) with the reading frame of the desired coding sequence to ensure 
translation of the entire insert. These exogenous translational control signals and initiation 
codons can be of a variety of origins, both natural and synthetic. The efficiency of expression 
may be enhanced by the inclusion of appropriate transcription enhancer elements, transcription 
10 terminators. 

In eukaryotic expression, one will also typically desire to incorporate into the 
^ transcriptional unit an appropriate polyadenylation site (e.g., 5'-AATAAA-3') if one was not 
"jt contained within the original cloned segment. Typically, the poly-A addition site is placed about 
HI 5 30 to 2000 nucleotides "downstream" of the termination site of the protein at a position prior to 
a transcription termination. 

^ For long-term, high-yield production of recombinant P-TEFb or kinase proteins, stable 

U expression is preferred. For example, cell lines that stably express constructs encoding P-TEFb 
. g20 or kinase proteins may be engineered. Rather than using expression vectors that contain viral 
" £ origins of replication, host cells can be transformed with vectors controlled by appropriate 
expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, 
polyadenylation sites, etc.), and a selectable marker. Following the introduction of foreign DNA, 
engineered cells may be allowed to grow for 1-2 days in an enriched media, and then are 
25 switched to a selective media. The selectable marker in the recombinant plasmid confers 
resistance to the selection and allows cells to stably integrate the plasmid into their chromosomes 
and grow to form foci which in turn can be cloned and expanded into cell lines. 
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A number of selection systems may be used, including, but not limited, to the herpes 
simplex virus (HSV) tk, hypoxanthine-guanine phosphoribosyltransferase (hgprt) and adenine 
phosphoribosyltransferase genes (aprt), in tk-, hgprt- or aprt- cells, respectively. Also, 
antimetabolite resistance can be used as the basis of selection for dhfr, that confers resistance to 
5 methotrexate; gpt, that confers resistance to mycophenolic acid; neo, that confers resistance to 
the aminoglycoside G-418; and hygro, that confers resistance to hygromycin. 

Animal cells can be propagated in vitro in two modes: as non-anchorage dependent cells 
growing in suspension throughout the bulk of the culture or as anchorage-dependent cells 
10 requiring attachment to a solid substrate for their propagation (i.e., a monolayer type of cell 
growth). 

m Non-anchorage dependent or suspension cultures from continuous established cell lines 

'fz are the most widely used means of large scale production of cells and cell products. However, 
HI 5 suspension cultured cells have limitations, such as tumorigenic potential and lower protein 
ul production than adherent cells. 

!Z Large scale suspension culture of mammalian cells in stirred tanks is a common method 

Mr for production of recombinant proteins. Two suspension culture reactor designs are in wide use - 
jj20 the stirred reactor and the airlift reactor. The stirred design has successfully been used on an 
^ 8000 liter capacity for the production of interferon. Cells are grown in a stainless steel tank with 
a height-to-diameter ratio of 1:1 to 3:1. The culture is usually mixed with one or more agitators, 
based on bladed disks or marine propeller patterns. Agitator systems offering less shear forces 
than blades have been described. Agitation may be driven either directly or indirectly by 
25 magnetically coupled drives. Indirect drives reduce the risk of microbial contamination through 
seals on stirrer shafts. 

The airlift reactor, also initially described for microbial fermentation and later adapted for 
mammalian culture, relies on a gas stream to both mix and oxygenate the culture. The gas 
30 stream enters a riser section of the reactor and drives circulation. Gas disengages at the culture 
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surface, causing denser liquid free of gas bubbles to travel downward in the downcomer section 
of the reactor. The main advantage of this design is the simplicity and lack of need for 
mechanical mixing. Typically, the height-to-diameter ratio is 10:1. The airlift reactor scales up 
relatively easily, has good mass transfer of gases and generates relatively low shear forces. 

5 

It is contemplated that the P-TEFb subunits or proteins of the invention may be 
"overexpressed", Le., expressed in increased levels relative to its natural expression in cells. 
Such overexpression may be assessed by a variety of methods, including radio-labeling and/or 
protein purification. However, simple and direct methods are preferred, for example, those 
10 involving SDS/PAGE and protein staining or western blotting, followed by quantitative analyses, 
such as densitometric scanning of the resultant gel or blot. A specific increase in the level of the 
recombinant protein or peptide in comparison to the level in natural cells is indicative of 
overexpression, as is a relative abundance of the specific protein in relation to the other proteins 
produced by the host cell and, e.g., visible on a gel. 

15 

III. P-TEFb Proteins and Peptides 

The present invention therefore provides purified, and in preferred embodiments, 
substantially purified, P-TEFb proteins, subunits and peptides. The term "purified P-TEFb 
protein, subunit or peptide" as used herein, is intended to refer to a P-TEFb proteinaceous 
20 composition, isolatable from insect, mammalian, human or recombinant host cells, wherein the 
P-TEFb protein, subunit or peptide is prepared at levels that could not be previously obtained 
prior to the cloning of the P-TEFb subunit genes. A purified P-TEFb protein, subunit or peptide 
therefore also refers to a P-TEFb protein, subunit or peptide free from the environment in which 
it naturally occurs. 

25 

P-TEFb proteins may be full length proteins comprising either a small subunit or large 
subunit, such as being 404 (SEQ ID NO:2), 1113 (SEQ ID NO:4), 696 (SEQ ID NO:45), 729 
(SEQ ID NO:47) or 726 (SEQ ID NO:50) amino acids in length. P-TEFb proteins, polypeptides 
and peptides may also be less then full length proteins, such as individual domains, regions or 
30 even epitopic peptides. Where less than full length P-TEFb proteins are concerned the most 
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preferred will be those containing predicted immunogenic sites and those containing the 
functional domains identified herein. Preferred P-TEFb kinase domains or fragments will be 
those sufficient to phosphorylate RNA polymerase II. 

5 Generally, "purified" will refer to a P-TEFb protein, subunit or peptide composition that 

has been subjected to fractionation to remove various non-P-TEFb protein, subunit or peptide 
components, and which composition substantially retains its P-TEFb activity, as may be assessed 
by phosphorylation of RNA polymerase II and inhibition by DRB. 

10 Where the term "substantially purified" is used, this will refer to a composition in which 

the P-TEFb protein, subunit or peptide forms the major component of the composition, such as 
constituting about 50% of the proteins in the composition or more. In preferred embodiments, a 
J?j substantially purified protein will constitute more than 60%, 70%, 80%, 90%, 95%, 99% or even 
more of the proteins in the composition. 

MT5 

m A polypeptide or protein that is "purified to homogeneity," as applied to the present 

^ invention, means that the polypeptide or protein has a level of purity where the polypeptide or 
^ protein is substantially free from other proteins and biological components. For example, a 
H purified polypeptide or protein will often be sufficiently free of other protein components so that 
~d20 degradative sequencing may be performed successfully. 

Various methods for quantifying the degree of purification of P-TEFb proteins, subunits 
or peptides will be known to those of skill in the art in light of the present disclosure. These 
include, for example, determining the specific ability to phosphorylate RNA polymerase II of a 
25 fraction, or assessing the number of polypeptides within a fraction by gel electrophoresis. 

To purify a P-TEFb protein, subunit or peptide a natural or recombinant composition 
comprising at least some P-TEFb proteins, subunits or peptides will be subjected to fractionation 
to remove various non-P-TEFb components from the composition. Various techniques suitable 
30 for use in protein purification will be well known to those of skill in the art. These include, for 
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example, precipitation with ammonium sulfate, PEG, antibodies and the like or by heat 
denaturation, followed by centrifugation; chromatography steps such as ion exchange, gel 
filtration, reverse phase, hydroxylapatite, lectin affinity and other affinity chromatography steps; 
isoelectric focusing; gel electrophoresis; and combinations of such and other techniques. 

A specific example presented herein is the purification of a P-TEFb fusion protein using a 
specific binding partner. Such purification methods are routine in the art. As the present 
invention provides DNA sequences for P-TEFb proteins, any fusion protein purification method 
can now be practiced. This is currently exemplified by the generation of a P-TEFb-glutathione 
S-transferase fusion protein, expression in E. coli, and isolation to homogeneity using affinity 
chromatography on glutathione-agarose. 

The exemplary purification method of fractionation via column chromatography, using a 
series of chromatography columns, such as phosphocellulose (P-ll), DEAE cellulose, phenyl 
sepharose, hydroxylapatite, MonoQ, MonoS and the like, as disclosed herein represents one 
method to prepare a substantially purified P-TEFb protein, subunit or peptide. This method is 
preferred as it results in the substantial purification of the P-TEFb protein, subunit or peptide in 
yields sufficient for further characterization and use. However, given the DNA and proteins 
provided by the present invention, any purification method can now be employed. 

Although preferred for use in certain embodiments, there is no general requirement that 
the P-TEFb protein, subunit or peptide always be provided in their most purified state. Indeed, it 
is contemplated that less substantially purified P-TEFb proteins, subunits or peptides, which are 
nonetheless enriched in P-TEFb compositions, relative to the natural state, will have utility in 
certain embodiments. These include, for example, phosphorylating RNA polymerase II, as may 
be used to examine RNA elongation; and antibody generation where subsequent inhibitor 
screening assays using purified P-TEFb are conducted. 
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Methods exhibiting a lower degree of relative purification may have advantages in total 
recovery of protein product, or in maintaining the activity of an expressed protein. Inactive 
products also have utility in certain embodiments, such as, e.g., in antibody generation. 

5 IV. P-TEFb Antibodies and Immunological Reagents 
A. Epitopic Core Sequences 

Peptides corresponding to one or more antigenic determinants, or "epitopic core regions", 
of P-TEFb proteins or subunits of the present invention can also be prepared. Such peptides 
should generally be at least 7 or 8 amino acid residues in length, will preferably be about 10, 15, 
10 20, 25 or about 30 amino acid residues in length, and may contain up to about 35-50 residues or 
so. 

m Synthetic peptides will generally be about, or less than about, 35-36 residues long, which 

fi is the approximate upper length limit of automated peptide synthesis machines, such as those 
H45 available from Applied Biosy stems (Foster City, CA). Longer peptides may also be prepared, 
03 e.g. , by recombinant means. 

Li U.S. Patent 4,554,101, (Hopp) incorporated herein by reference, teaches the identification 

M= and preparation of epitopes from primary amino acid sequences on the basis of hydrophilicity. 
JJ20 Through the methods disclosed in Hopp, one of skill in the art would be able to identify epitopes 
~~ 4 from within an amino acid sequence such as the P-TEFb sequence disclosed herein (SEQ ID 
NO:2; SEQ ID NO:4; SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50). 

Numerous scientific publications have also been devoted to the prediction of secondary 
25 structure, and to the identification of epitopes, from analyses of amino acid sequences (Chou and 
Fasman, 1974a,b; 1978a,b, 1979). Any of these may be used, if desired, to supplement the 
teachings of Hopp in U.S. Patent 4,554,101 . 

Moreover, computer programs are currently available to assist with predicting antigenic 
30 portions and eptiopic core regions of proteins. Examples include those programs based upon the 
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Jameson-Wolf analysis (Jameson and Wolf, 1998; Wolf et al. 9 1988), the program PepPlot® 
(Brutlag et al 9 1990; Weinberger et aL, 1985), and other new programs for protein tertiary 
structure prediction (Fetrow and Bryant, 1993). Further commercially available software 
capable of carrying out such analyses is termed Mac Vector® (IBI, New Haven, CT). 

5 

In further embodiments, major antigenic determinants of a polypeptide may be identified 
by an empirical approach in which portions of the gene encoding the polypeptide are expressed 
in a recombinant host, and the resulting proteins tested for their ability to elicit an immune 
response. For example, PCR™ can be used to prepare a range of peptides lacking successively 
10 longer fragments of the C-terminus of the protein. The immunoactivity of each of these peptides 
is determined to identify those fragments or domains of the polypeptide that are 
immunodominant. Further studies in which only a small number of amino acids are removed at 
each iteration then allows the location of the antigenic determinants of the polypeptide to be 
more precisely determined. 

15 

Once one or more such analyses are completed, polypeptides are prepared that contain at 
least the essential features of one or more antigenic determinants. The peptides are then 
employed in the generation of antisera against the polypeptide. Minigenes or gene fusions 
encoding these determinants can also be constructed and inserted into expression vectors by 
20 standard methods, for example, using PCR™ cloning methodology. 



The use of such small peptides for vaccination typically requires conjugation of the 
peptide to an immunogenic carrier protein, such as hepatitis B surface antigen, keyhole limpet 
hemocyanin or bovine serum albumin. Methods for performing this conjugation are well known 
25 in the art. 



B. Antibody Generation 

In certain embodiments, the present invention provides antibodies that bind with high 
specificity to P-TEFb, and other antibodies that bind to the protein products of the isolated 
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nucleic acid sequences of SEQ ID NO:l, SEQ ID NO:3 SEQ ID NO:43, SEQ ID NO:46 or SEQ 
ID NO:48. Antibodies specific for the wild type proteins and peptides and those specific for any 
one of a number of mutants are provided. As detailed above, in addition to antibodies generated 
against the full length proteins, antibodies may also be generated in response to smaller 
5 constructs comprising epitopic core regions, including wild type and mutant epitopes. 

As used herein, the term "antibody" is intended to refer broadly to any immunologic 
binding agent such as IgG, IgM, IgA, IgD and IgE. Generally, IgG and/or IgM are preferred 
because they are the most common antibodies in the physiological situation and because they are 
10 most easily made in a laboratory setting. 

Monoclonal antibodies (MAbs) are recognized to have certain advantages, e.g., 
reproducibility and large-scale production, and their use is generally preferred. The invention 
thus provides Mabs of the human, murine, monkey, rat, hamster, rabbit and even chicken origin. 
15 Due to the ease of preparation and ready availability of reagents, murine Mabs will often be 
preferred. 

However, "humanized" antibodies are also contemplated, as are chimeric antibodies from 
mouse, rat, or other species, bearing human constant and/or variable region domains, bispecific 
20 antibodies, recombinant and engineered antibodies and fragments thereof. Methods for the 
development of antibodies that are "custom-tailored" to the patient's tumor are likewise known 
and such custom-tailored antibodies are also contemplated. 

The term "antibody" is used to refer to any antibody-like molecule that has an antigen 
25 binding region, and includes antibody fragments such as Fab', Fab, F(ab') 2 , single domain 
antibodies (DABs), Fv, scFv (single chain Fv), and the like. The techniques for preparing and 
using various antibody-based constructs and fragments are well known in the art. Means for 
preparing and characterizing antibodies are well known in the art (See, e.g., Antibodies: A 
Laboratory Manual, Cold Spring Harbor Laboratory, 1988; incorporated herein by reference). 

30 
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The methods for generating Mabs generally begin along the same lines as those for 
preparing polyclonal antibodies. Briefly, a polyclonal antibody is prepared by immunizing an 
animal with an immunogenic P-TEFb composition in accordance with the present invention and 
collecting antisera from that immunized animal. 

5 

A wide range of animal species can be used for the production of antisera. Typically the 
animal used for production of anti-antisera is a rabbit, a mouse, a rat, a hamster, a guinea pig or a 
goat. Because of the relatively large blood volume of rabbits, a rabbit is a preferred choice for 
production of polyclonal antibodies. 

10 

As is well known in the art, a given composition may vary in its immunogenicity. It is 
often necessary therefore to boost the host immune system, as may be achieved by coupling a 
peptide or polypeptide immunogen to a carrier. Exemplary and preferred carriers are keyhole 
limpet hemocyanin (KLH) and bovine serum albumin (BSA). Other albumins such as 
1 5 ovalbumin, mouse serum albumin or rabbit serum albumin can also be used as carriers. Means 
for conjugating a polypeptide to a carrier protein are well known in the art and include 
glutaraldehyde, m-maleimidobencoyl-N-hydroxysuccinimide ester, carbodiimyde and bis- 
biazotized benzidine. 

20 As is also well known in the art, the immunogenicity of a particular immunogen 

composition can be enhanced by the use of non-specific stimulators of the immune response, 
known as adjuvants. Suitable adjuvants include all acceptable immunostimulatory compounds, 
such as cytokines, toxins or synthetic compositions. 

25 Adjuvants that may be used include IL-1, IL-2, IL-4, IL-7, IL-12, g-interferon, GMCSP, 

BCG, aluminum hydroxide, MDP compounds, such as thur-MDP and nor-MDP, CGP (MTP- 
PE), lipid A, and monophosphoryl lipid A (MPL). RIBI, which contains three components 
extracted from bacteria, MPL, trehalose dimycolate (TDM) and cell wall skeleton (CWS) in a 
2% squalene/Tween 80 emulsion. MHC antigens may even be used. Exemplary, often preferred 

30 adjuvants include complete Freund's adjuvant (a non-specific stimulator of the immune response 
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containing killed Mycobacterium tuberculosis), incomplete FreuncTs adjuvants and aluminum 
hydroxide adjuvant. 

In addition to adjuvants, it may be desirable to coadminister biologic response modifiers 
5 (BRM), which have been shown to upregulate T cell immunity or downregulate suppresser cell 
activity. Such BRMs include, but are not limited to, Cimetidine (CIM; 1200 mg/d) 
(Smith/Kline, PA); or low-dose Cyclophosphamide (CYP; 300 mg/m 2 ) (Johnson/ Mead, NJ) and 
Cytokines such as g-interferon, IL-2, or IL-12 or genes encoding proteins involved in immune 
helper functions, such as B-7. 

10 

The amount of immunogen composition used in the production of polyclonal antibodies 
varies upon the nature of the immunogen as well as the animal used for immunization. A variety 

5 of routes can be used to administer the immunogen (subcutaneous, intramuscular, intradermal, 
intravenous and intraperitoneal). The production of polyclonal antibodies may be monitored by 

Hi 5 sampling blood of the immunized animal at various points following immunization. 

A second, booster injection, may also be given. The process of boosting and titering is 
^ repeated until a suitable titer is achieved. When a desired level of immunogenicity is obtained, 
M the immunized animal can be bled and the serum isolated and stored, and/or the animal can be 
^•20 used to generate MAbs. 

For production of rabbit polyclonal antibodies, the animal can be bled through an ear vein 
or alternatively by cardiac puncture. The removed blood is allowed to coagulate and then 
centrifuged to separate serum components from whole cells and blood clots. The serum may be 
25 used as is for various applications or else the desired antibody fraction may be purified by well- 
known methods, such as affinity chromatography using another antibody, a peptide bound to a 
solid matrix, or by using, e.g., protein A or protein G chromatography. 

MAbs may be readily prepared through use of well-known techniques, such as those 
30 exemplified in U.S. Patent 4,196,265, incorporated herein by reference. Typically, this technique 
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involves immunizing a suitable animal with a selected immunogen composition, e.g., a purified 
or partially purified P-TEFb protein, polypeptide, peptide or domain, be it a wild type or mutant 
composition. The immunizing composition is administered in a manner effective to stimulate 
antibody producing cells. 

5 

The methods for generating MAbs generally begin along the same lines as those for 
preparing polyclonal antibodies. Rodents such as mice and rats are preferred animals, however, 
the use of rabbit, sheep frog cells is also possible. The use of rats may provide certain 
advantages (Goding, 1986, pp. 60-61), but mice are preferred, with the BALB/c mouse being 
10 most preferred as this is most routinely used and generally gives a higher percentage of stable 
fusions. 

J; The animals are injected with antigen, generally as described above. The antigen may be 

fi coupled to carrier molecules such as keyhole limpet hemocyanin if necessary. The antigen 
Ml 5 would typically be mixed with adjuvant, such as Freund's complete or incomplete adjuvant. 
% Booster injections with the same antigen would occur at approximately two-week intervals. 

Following immunization, somatic cells with the potential for producing antibodies, 
H specifically B lymphocytes (B cells), are selected for use in the MAb generating protocol. These 
, f 20 cells may be obtained from biopsied spleens, tonsils or lymph nodes, or from a peripheral blood 
sample. Spleen cells and peripheral blood cells are preferred, the former because they are a rich 
source of antibody-producing cells that are in the dividing plasmablast stage, and the latter 
because peripheral blood is easily accessible. 

25 Often, a panel of animals will have been immunized and the spleen of animal with the 

highest antibody titer will be removed and the spleen lymphocytes obtained by homogenizing the 
spleen with a syringe. Typically, a spleen from an immunized mouse contains approximately 

7 8 

5x10 to 2x10 lymphocytes. 
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The antibody-producing B lymphocytes from the immunized animal are then fused with 
cells of an immortal myeloma cell, generally one of the same species as the animal that was 
immunized. Myeloma cell lines suited for use in hybridoma-producing fusion procedures 
preferably are non-antibody-producing, have high fusion efficiency, and enzyme deficiencies that 
5 render then incapable of growing in certain selective media which support the growth of only the 
desired fused cells (hybridomas). 

Any one of a number of myeloma cells may be used, as are known to those of skill in the 
art (Goding, pp. 65-66, 1986; Campbell, pp. 75-83, 1984). For example, where the immunized 
10 animal is a mouse, one may use P3-X63/Ag8, X63-Ag8.653, NSl/l.Ag 4 1, Sp210-Agl4, FO, 
NSO/U, MPC-11, MPC11-X45-GTG 1.7 and S194/5XX0 Bui; for rats, one may use 
R210.RCY3, Y3-Ag 1.2.3, IR983F and 4B210; and U-266, GM1500-GRG2, LICR-LON-HMy2 
and UC729-6 are all useful in connection with human cell fusions. 

15 One preferred murine myeloma cell is the NS-1 myeloma cell line (also termed P3-NS-1- 

Ag4-1), which is readily available from the NIGMS Human Genetic Mutant Cell Repository by 
requesting cell line repository number GM3573. Another mouse myeloma cell line that may be 
used is the 8-azaguanine-resistant mouse murine myeloma SP2/0 non-producer cell line. 

20 Methods for generating hybrids of antibody-producing spleen or lymph node cells and 

myeloma cells usually comprise mixing somatic cells with myeloma cells in a 2:1 proportion, 
though the proportion may vary from about 20:1 to about 1:1, respectively, in the presence of an 
agent or agents (chemical or electrical) that promote the fusion of cell membranes. Fusion 
methods using Sendai virus have been described by Kohler and Milstein (1975; 1976), and those 

25 using polyethylene glycol (PEG), such as 37% (v/v) PEG, by Gefter et al (1977). The use of 
electrically induced fusion methods is also appropriate (Goding pp. 71-74, 1986). 

Fusion procedures usually produce viable hybrids at low frequencies, about 1 x 10" 6 to 
1 x 10" . However, this does not pose a problem, as the viable, fused hybrids are differentiated 
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from the parental, unfused cells (particularly the unfused myeloma cells that would normally 
continue to divide indefinitely) by culturing in a selective medium. The selective medium is 
generally one that contains an agent that blocks the de novo synthesis of nucleotides in the tissue 
culture media. Exemplary and preferred agents are aminopterin, methotrexate, and azaserine. 
5 Aminopterin and methotrexate block de novo synthesis of both purines and pyrimidines, whereas 
azaserine blocks only purine synthesis. Where aminopterin or methotrexate is used, the media is 
supplemented with hypoxanthine and thymidine as a source of nucleotides (HAT medium). 
Where azaserine is used, the media is supplemented with hypoxanthine. 

10 The preferred selection medium is HAT. Only cells capable of operating nucleotide 

salvage pathways are able to survive in HAT medium. The myeloma cells are defective in key 
enzymes of the salvage pathway, e.g., hypoxanthine phosphoribosyl transferase (HPRT), and 
they cannot survive. The B cells can operate this pathway, but they have a limited life span in 
culture and generally die within about two weeks. Therefore, the only cells that can survive in 

15 the selective media are those hybrids formed from myeloma and B cells. 

This culturing provides a population of hybridomas from which specific hybridomas are 
selected. Typically, selection of hybridomas is performed by culturing the cells by single-clone 
dilution in microtiter plates, followed by testing the individual clonal supernatants (after about 
20 two to three weeks) for the desired reactivity. The assay should be sensitive, simple and rapid, 
such as radioimmunoassays, enzyme immunoassays, cytotoxicity assays, plaque assays, dot 
immunobinding assays, and the like. 

The selected hybridomas would then be serially diluted and cloned into individual 
25 antibody-producing cell lines, which clones can then be propagated indefinitely to provide 
MAbs. The cell lines may be exploited for MAb production in two basic ways. 

A sample of the hybridoma can be injected (often into the peritoneal cavity) into a 
histocompatible animal of the type that was used to provide the somatic and myeloma cells for 
30 the original fusion {e.g., a syngeneic mouse). Optionally, the animals are primed with a 
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hydrocarbon, especially oils such as pristane (tetramethylpentadecane) prior to injection. The 
injected animal develops tumors secreting the specific monoclonal antibody produced by the 
fused cell hybrid. The body fluids of the animal, such as serum or ascites fluid, can then be 
tapped to provide MAbs in high concentration. 

5 

The individual cell lines could also be cultured in vitro, where the MAbs are naturally 
secreted into the culture medium from which they can be readily obtained in high concentrations. 

MAbs produced by either means may be further purified, if desired, using filtration, 
10 centrifugation and various chromatographic methods such as HPLC or affinity chromatography. 
Fragments of the Mabs of the invention can be obtained from the Mabs so produced by methods 
which include digestion with enzymes, such as pepsin or papain, and/or by cleavage of disulfide 
bonds by chemical reduction. Alternatively, monoclonal antibody fragments encompassed by 
the present invention can be synthesized using an automated peptide synthesizer. 

15 

It is also contemplated that a molecular cloning approach may be used to generate 
monoclonals. For this, combinatorial immunoglobulin phagemid libraries are prepared from 
RNA isolated from the spleen of the immunized animal, and phagemids expressing appropriate 
antibodies are selected by panning using cells expressing the antigen and control cells. The 
20 advantages of this approach over conventional hybridoma techniques are that approximately 10 4 
times as many antibodies can be produced and screened in a single round, and that new 
specificities are generated by H and L chain combination which further increases the chance of 
finding appropriate antibodies. 

25 Alternatively, monoclonal antibody fragments encompassed by the present invention can 

be synthesized using an automated peptide synthesizer, or by expression of full-length gene or of 
gene fragments in E. coli. 
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C. Antibody Conjugates 

The present invention further provides anti-P-TEFb antibodies, generally of the 
monoclonal type, that are linked to one or more other agents to form an antibody conjugate. Any 
antibody of sufficient selectivity, specificity and affinity may be employed as the basis for an 
5 antibody conjugate. Such properties may be evaluated using conventional immunological 
screening methodology known to those of skill in the art. 

Certain examples of antibody conjugates are those conjugates in which the antibody is 
linked to a detectable label. "Detectable labels" are compounds or elements that can be detected 
10 due to their specific functional properties, or chemical characteristics, the use of which allows the 
antibody to which they are attached to be detected, and further quantified if desired. Another 
such example is the formation of a conjugate comprising an antibody linked to a cytotoxic or 
;| anti-cellular agent, as may be termed "immunotoxins". In the context of the present invention, 
'fi immunotoxins are generally less preferred. 
H15 

ffi Antibody conjugates are thus preferred for use as diagnostic agents. Antibody 

J* diagnostics generally fall within two classes, those for use in in vitro diagnostics, such as in a 
^ variety of immunoassays, and those for use in vivo diagnostic protocols, generally known as 
h "antibody-directed imaging". Again, antibody-directed imaging is less preferred for use with this 
J20 invention. 

Many appropriate imaging agents are known in the art, as are methods for their 
attachment to antibodies (see, e.g., U.S. patents 5,021,236 and 4,472,509, both incorporated 
herein by reference). Certain attachment methods involve the use of a metal chelate complex 
25 employing, for example, an organic chelating agent such a DTPA attached to the antibody (U.S. 
Patent 4,472,509). Monoclonal antibodies may also be reacted with an enzyme in the presence 
of a coupling agent such as glutaraldehyde or periodate. Conjugates with fluorescein markers are 
prepared in the presence of these coupling agents or by reaction with an isothiocyanate. 
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In the case of paramagnetic ions, one might mention by way of example ions such as 
chromium (III), manganese (II), iron (III), iron (II), cobalt (II), nickel (II), copper (II), 
neodymium (III), samarium (III), ytterbium (III), gadolinium (III), vanadium (II), terbium (III), 
dysprosium (III), holmium (III) and erbium (III), with gadolinium being particularly preferred. 
5 Ions useful in other contexts, such as X-ray imaging, include but are not limited to lanthanum 
(III), gold (III), lead (II), and especially bismuth (III). 

In the case of radioactive isotopes for therapeutic and/or diagnostic application, one 
might mention astatine 211 , 14 carbon, 51 chromium, 36 chlorine, 57 cobalt, 58 cobalt, copper 67 , 152 Eu, 
10 gallium 67 , 3 hydrogen, iodine 123 , iodine 125 , iodine 131 , indium 111 , 59 iron, 32 phosphorus, rhenium 186 , 
rhenium 188 , 75 selenium, 35 sulphur, technicium 99m and yttrium 90 . 125 Iodine is often being preferred 

QQ 111 

for use in certain embodiments, and technicium and indium are also often preferred due to 
their low energy and suitability for long range detection. 

15 Radioactively labeled Mabs of the present invention may be produced according to well- 

known methods in the art. For instance, Mabs can be iodinated by contact with sodium or 
potassium iodide and a chemical oxidizing agent such as sodium hypochlorite, or an enzymatic 
oxidizing agent, such as lactoperoxidase. Monoclonal antibodies according to the invention may 
be labeled with technetium-"m by ligand exchange process, for example, by reducing 

20 pertechnate with stannous solution, chelating the reduced technetium onto a Sephadex column 
and applying the antibody to this column or by direct labeling techniques, e.g., by incubating 
pertechnate, a reducing agent such as SNC1 2 , a buffer solution such as sodium-potassium 
phthalate solution, and the antibody. 

25 Intermediary functional groups which are often used to bind radioisotopes which exist as 

metallic ions to antibody are diethylenetriaminepentaacetic acid (DTPA) and ethylene 
diaminetetracetic acid (EDTA). Fluorescent labels include rhodamine, fluorescein isothiocyanate 
and renographin. 
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The much preferred antibody conjugates of the present invention are those intended 
primarily for use in vitro, where the antibody is linked to a secondary binding ligand or to an 
enzyme (an enzyme tag) that will generate a colored product upon contact with a chromogenic 
substrate. Examples of suitable enzymes include urease, alkaline phosphatase, (horseradish) 
5 hydrogen peroxidase and glucose oxidase. Preferred secondary binding ligands are biotin and 
avidin or streptavidin compounds. The use of such labels is well known to those of skill in the 
art in light and is described, for example, in U.S. Patents 3,817,837; 3,850,752; 3,939,350; 
3,996,345; 4,277,437; 4,275,149 and 4,366,241; each incorporated herein by reference. 

10 D. Immunodetection Methods and Kits 

In still further embodiments, the present invention concerns immunodetection methods 
for binding, purifying, removing, quantifying or otherwise generally detecting biological 
components, such as HIV Tat, that interact with P-TEFb. The binding assays will often be used 
as a pre-screen to identify agents that inhibit productive RNA elongation, and especially to 
15 identify agents that inhibit viral-mediated RNA elongation by disrupting the interaction of viral 
transactivating proteins, such as Tat, that need to interact with P-TEFb in order to exert their 
effect on the host cell RNA polymerase II. Subsequent to the identification of components that 
inhibit the binding of a viral protein to P-TEFb in an immunodetection assay, the compounds 
will generally be further examined in RNA elongation activity assays to confirm their suitability. 

20 

Immunodetection methods may also be employed to identify novel proteins or protein 
domains that bind to P-TEFb, and/or to identify a new property of a known protein, allowing the 
definition of that protein as a P-TEFb-binding protein. In any event, the identification of an 
additional viral protein that binds to P-TEFb will allow the identified protein to become the 
25 target of new anti-viral strategies. The first step of such anti-viral approaches will, again, often 
be based upon a binding assay in which potential anti-viral compounds are tested for their ability 
to inhibit the binding of the newly discovered protein to P-TEFb. 

Accordingly, in the following description of binding assays, the term "viral 
30 transactivating protein" is used to refer to any known or discovered viral transactivating protein 
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or domain that binds to P-TEFb and, most preferably, that binds to human P-TEFb. Examples of 
viral transactivating proteins that are herein discovered to bind to P-TEFb are HIV Tat and VP 16 
from herpes virus. El A from adenovirus is also contemplated to be an appropriate target for use 
in such a binding assay. Naturally, where the intent of the assay is to identify viral proteins that 
5 bind to P-TEFb, the "viral transactivating protein" will more properly be termed a "viral 
composition comprising a candidate transactivating protein", which is generally exemplified by 
extracts comprising viral proteins produced during host cell infection. 

Antibodies and labeled antibodies against the P-TEFb proteins, subunits or peptides of 
10 the present invention may also be employed to purify P-TEFb, as may be used in purification 
from insect, mammalian or human sources, including recombinant host cells, via immunoaffinity 
protocols, such as immunoaffinity chromatography. In the purification methods, the antibody 
removes or purifies the antigenic P-TEFb component from a sample. The antibody will 
preferably be linked to a solid support, such as in the form of a column matrix, and the sample 
15 suspected of containing the P-TEFb antigenic component will be applied to the immobilized 
antibody. The unwanted components will be washed from the column, leaving the antigen 
immunocomplexed to the immobilized antibody, which P-TEFb antigen is then collected by 
removing the P-TEFb from the column. 

20 Antibodies and labeled antibodies against the P-TEFb proteins, subunits or peptides of 

the invention may even further be employed to detect P-TEFb in samples, including recombinant 
host cells and clinical samples. Such antibodies may ultimately be employed in diagnostic 
embodiments to detect increased or decreased levels of P-TEFb or to detect mutant P-TEFb 
proteins, subunits or peptides in human tissue or fluid samples. As P-TEFb is centrally 

25 important to RNA elongation, disturbances in P-TEFb levels or types are likely to be implicated 
in diseases, such as cancer, in which gene expression is perturbed. Therefore, the use of wild 
type- and mutant-specific antibodies is contemplated. However, these embodiments also have 
applications to non-clinical samples, such as in the titering of antigen or antibody samples, in the 
selection of hybridomas, and the like. 

30 
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Accordingly, detection of P-TEFb alone is another important aspect of the invention. In 
general, simple P-TEFb immunobinding methods include obtaining a sample suspected of 
containing a P-TEFb protein, subunit, peptide, or mutant thereof, and contacting the sample with 
a first anti-P-TEFb antibody or anti-mutant P-TEFb antibody in accordance with the present 
5 invention, under conditions effective to allow the formation of immunocomplexes, and then 
detecting the immunocomplexes so formed. 

In the clinical diagnosis or monitoring of patients with diseases such as cancer, the 
detection of a P-TEFb mutant, or an alteration in the levels of P-TEFb, in comparison to the 

10 levels in a corresponding biological sample from a normal subject will be indicative of a patient 
with the disease or cancer. However, as is known to those of skill in the art, such a clinical 
diagnosis would not necessarily be made on the basis of this method in isolation. Those of skill 
in the art are very familiar with differentiating between significant differences in types or 
amounts of biomarkers, which represent a positive identification, and low level or background 

15 changes of biomarkers. Indeed, background expression levels are often used to form a "cut-off 1 
above which increased detection will be scored as significant or positive. 

The P-TEFb-viral protein immunobinding methods of the present invention generally 
include obtaining a sample suspected of containing a viral transactivating protein, domain, 

20 subunit or active peptide, and contacting the sample with a first P-TEFb composition, complex, 
protein, subunit, domain or binding peptide, in accordance with the present invention, under 
conditions effective to allow the formation of bound protein complexes, and then detecting the 
bound protein complexes so formed. The detection of the "bound protein complexes", in the 
manner described herein, is highly analogous to the techniques of immunocomplex detection, 

25 even in embodiments where either the viral transactivating protein or the P-TEFb protein are 
directly labeled, and all such technology regarding immobilization, specific binding, non-specific 
binding and washing is applicable. 

The steps of various useful immunodetection methods have been described in the 
30 scientific literature, such as, e.g., Nakamura et al (1987), incorporated herein by reference. In 
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terms of detecting or quantifying the amount of viral transactivating proteins, i.e., P-TEFb 
reactive components in a sample, the methods require the detection or quantification of any 
bound protein or immune complexes formed during the binding process. Here, one would obtain 
a sample known or suspected to contain a viral transactivating protein and contact the sample 
5 with a P-TEFb protein, subunit or peptide, and then detect or quantify the amount of bound 
protein complexes formed under the specific conditions. 

Contacting the chosen viral transactivating sample with the P-TEFb sample will be 
effected under conditions effective and for a period of time sufficient to allow the formation of 

10 bound protein complexes (primary complexes), which is generally a matter of simply adding the 
two compositions or samples and incubating the mixture for a period of time long enough for any 
viral transactivating proteins and P-TEFb proteins present to form bound protein complexes. 
After this time, the potentially bound protein complexes, as may be present on an ELISA plate, 
dot blot or such like, will generally be washed to remove any non-specifically bound protein 

15 species, allowing only those species specifically bound within the primary complexes to be 
detected. 

In general, the detection of bound protein complexes is analogous to the detection of 
immunocomplexes, and is well known to those of skill in the art. Detection may be achieved 
20 through the application of numerous approaches. These methods are generally based upon the 
detection of a label or marker, such as any of those radioactive, fluorescent, biological or 
enzymatic tags. U.S. Patents concerning the use of such labels include 3,817,837; 3,850,752; 
3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241, each incorporated herein by 
reference. 

25 

Preferably, in detection, one would find additional advantages through the use of an 
antibody, or other secondary binding ligand, such as a second biotin/avidin ligand binding 
arrangement, as is known in the art. The antibody employed in the detection may be an anti- 
P-TEFb antibody or an anti-viral protein antibody, such as an anti-HIV Tat antibody. Such 
30 antibodies would be termed "primary antibodies". The primary antibodies may themselves be 
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linked to a detectable label, wherein one would then simply detect this label, thereby allowing 
the amount of the bound protein complexes in the composition to be determined. 

Alternatively, the first anti-P-TEFb antibody or anti-viral protein antibody (such as anti- 
5 HIV Tat) that becomes bound within the protein complexes may be detected by means of a 
second binding ligand that has binding affinity for the first antibody. In these cases, the second 
binding ligand may be linked to a detectable label The second binding ligand is itself often an 
antibody, which may thus be termed a "secondary" antibody. The primary bound protein- 
antibody complexes are contacted with the labeled, secondary binding ligand, or antibody, under 

10 conditions effective and for a period of time sufficient to allow the formation of secondary 
immune complexes. The secondary immune complexes are then generally washed to remove 
any non-specifically bound labeled secondary antibodies or ligands, and the remaining label in 
the secondary immune complexes is then detected. Further methods including the use of tertiary 
binding ligands or antibodies linked to a detectable labels are also contemplated, particularly 

15 where signal amplification is desired. 

1. ELISAs 

As detailed above, immunoassays, in their most simple and direct sense, are binding 
assays. Certain preferred immunoassays are the various types of enzyme linked immunosorbent 
20 assays (ELISAs) and radioimmunoassays (RIA) known in the art. Immunohistochemical 
detection using tissue sections is also particularly useful. However, it will be readily appreciated 
that detection is not limited to such techniques, and western blotting, dot blotting, FACS 
analyses, and the like may also be used. 

25 In one exemplary ELISA, either P-TEFb or a viral transactivating protein, such as HIV 

Tat, is immobilized onto a selected surface exhibiting protein affinity, such as a well in a 
polystyrene microtiter plate. Then, a composition containing the counterpart viral transactivating 
protein, or P-TEFb, is added to the wells. After binding and washing to remove non-specifically 
bound complexes, the bound P-TEFb-viral protein complex may be detected. Detection is 

30 generally achieved by the addition of an anti-P-TEFb or anti-viral protein antibody that is linked 
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to a detectable labeL Detection may also be achieved by the addition of a first anti-P-TEFb or 
anti-viral protein antibody, followed by a second antibody that has binding affinity for the first 
antibody, with the second antibody being linked to a detectable label 

5 Irrespective of the format employed, ELISAs have certain features in common, such as 

coating, incubating or binding, washing to remove non-specifically bound species, and detecting 
the bound immune complexes. These are described as follows: 

In coating a plate with either P-TEFb or a viral transactivating protein, such as Tat, VP 16 
10 or El A, one will generally incubate the wells of the plate with a solution of the agent, either 
overnight or for a specified period of hours. The wells of the plate will then be washed to 
remove incompletely adsorbed material. Any remaining available surfaces of the wells are then 
"coated" with a nonspecific protein that is neutral with regard to binding to the biological 
components. These include bovine serum albumin (BSA), casein and solutions of milk powder. 
15 The coating allows for blocking of nonspecific adsorption sites on the immobilizing surface and 
thus reduces the background caused by nonspecific binding of proteins onto the surface. 

In the ELISAs of the present invention it will probably be more customary to use a 
secondary or tertiary detection means rather than a direct procedure using a labeled P-TEFb or a 
20 viral transactivating protein. Thus, after binding of the first protein to the well, coating with a 
non-reactive material to reduce background, and washing to remove unbound material, the 
immobilizing surface is contacted with the second biological protein under conditions effective 
to allow protein complex formation. Detection of the complex then requires a labeled binding 
ligand or antibody. 

25 

"Under conditions effective to allow protein complex formation" means that the 
conditions preferably include diluting the P-TEFb and viral transactivating proteins, with 
solutions such as BSA, bovine gamma globulin (BGG) and phosphate buffered saline 
(PBS)/Tween. These added agents also tend to assist in the reduction of nonspecific background. 
30 The "suitable" conditions also mean that the incubation is at a temperature and for a period of 
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time sufficient to allow effective binding. Incubation steps are typically from about 1 to 2 to 
4 hours, at temperatures preferably on the order of 25° to 27°C, or may be overnight at about 4°C 
or so. 

Following all incubation steps in an ELISA, the contacted surface is washed so as to 
remove non-complexed material. A preferred washing procedure includes washing with a 
solution such as PBS/Tween, or borate buffer. Following the formation of specific complexes 
between the test sample and the originally bound material, and subsequent washing, the 
occurrence of even minute amounts of bound complexes may be determined. 

To provide for detection, a first or second antibody will preferably be provided that has 
an associated label to allow detection. However, given that the present invention provides highly 
purified P-TEFb components, and as viral transactivating proteins, such as HIV Tat, are readily 
available, the P-TEFb or viral protein may also be directly labeled. 

Preferably, the label will be an enzyme that will generate color development upon 
incubating with an appropriate chromogenic substrate. Thus, for example, one will desire to 
contact and incubate the bound complexes with a urease, glucose oxidase, alkaline phosphatase 
or hydrogen peroxidase-conjugated antibody for a period of time and under conditions that favor 
the development of immunocomplex formation (e.g., incubation for 2 hours at room temperature 
in a PBS-containing solution such as PBS-Tween). 

After incubation with the labeled antibody, and subsequent to washing to remove 
unbound material, the amount of label is quantified, e.g., by incubation with a chromogenic 
substrate such as urea and bromocresol purple or 2,2 , -azino-di-(3-ethyl-benzthiazoline-6-sulfonic 
acid [ABTS] and H 2 0 2 , in the case of peroxidase as the enzyme label. Quantification is then 
achieved by measuring the degree of color generation, e.g., using a visible spectra 
spectrophotometer. 
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In still further embodiments, the present invention concerns immunodetection kits for use 
with the immunodetection methods described above. P-TEFb, a viral transactivating protein and, 
preferably, a labeled antibody to at least one of such components will most generally be included 
in the kit. However, kits including less than or more than the foregoing components may also be 
5 provided. In preferred embodiments of the methods and kits, the antibodies used will be 
monoclonal antibodies (MAbs). 

The immunodetection kits will comprise all the supplied components in suitable 
container means. In certain embodiments, the kits may comprise a P-TEFb or a viral 
10 transactivating protein, that is pre-bound to a solid support, such as a column matrix or well of a 
microtitre plate. 

The immunodetection reagents of the kit may take any one of a variety of forms, 
including those detectable labels that are associated with or linked to the given P-TEFb or viral 
15 transactivating protein. Detectable labels that are associated with or attached to a secondary 
binding ligands or antibody are generally preferred. As noted above, a number of exemplary 
labels are known in the art and all such labels may be employed in connection with the present 
invention. 

20 The kits may further comprise suitably aliquoted compositions of a P-TEFb protein or 

polypeptide or a viral transactivating protein, whether labeled or unlabeled, as may be used to 
prepare a standard curve for a detection assay. The kits may also contain protein- or antibody- 
label conjugates either in fully conjugated form, in the form of intermediates, or as separate 
moieties to be conjugated by the user of the kit. The components of the kits may be packaged 

25 either in aqueous media or in lyophilized form. 

The container means of the kits will generally include at least one vial, test tube, flask, 
bottle, syringe or other container means, into which the P-TEFb or viral transactivating protein, 
and other optional components, may be placed, and preferably, suitably aliquoted. Where second 
30 and third additional binding components are provided, the kit will also generally contain a 
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second, third or other additional container into which such components may be placed. The kits 
of the present invention will also typically include a means for containing the reagent containers 
in close confinement for commercial sale. Such containers may include injection or blow- 
molded plastic containers into which the desired vials are retained. 

5 

V. Biological Functional Equivalents 

As will be understood by those of skill in the art, modification and changes may be made 
in the structure of the P-TEFb kinase and large subunits and still obtain a molecule having like or 
otherwise desirable characteristics. For example, certain amino acids may be substituted for 

10 other amino acids in a protein structure without appreciable loss of interactive binding capacity 
with structures such as, for example, antigen-binding regions of antibodies or binding sites on 
molecules such as Tat and RNA polymerase II. Since it is the interactive capacity and nature of a 
protein that defines that protein's biological functional activity, certain amino acid sequence 
substitutions can be made in a protein sequence (or, of course, its underlying DNA coding 

15 sequence) and nevertheless obtain a protein with like (agonistic) properties. It is thus 
contemplated by the inventor that various changes may be made in the sequence of P-TEFb 
proteins or peptides (or underlying DNA) without appreciable loss of their biological utility or 
activity. 

20 Equally, the same considerations may be employed to create a P-TEFb protein or peptide 

with counterveiling (e.g., antagonistic) properties. This is relevant to the present invention in 
which P-TEFb analogues without Tat binding activity are contemplated to be useful in inhibiting 
the ability of the HIV virus to promote viral RNA elongation by way of inhibiting Tat binding to 
P-TEFb. 

25 

In terms of functional equivalents, it is also well understood by the skilled artisan that, 
inherent in the definition of a biologically functional equivalent protein or peptide, is the concept 
that there is a limit to the number of changes that may be made within a defined portion of the 
molecule and still result in a molecule with an acceptable level of equivalent biological activity. 
30 Biologically functional equivalent peptides are thus defined herein as those peptides in which 
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certain, not most or all, of the amino acids may be substituted. In particular, where small 
peptides are concerned, less amino acids may be changed. Of course, a plurality of distinct 
proteins/peptides with different substitutions may easily be made and used in accordance with 
the invention. 

5 

It is also well understood that where certain residues are shown to be particularly 
important to the biological or structural properties of a protein or peptide, e.g., residues in the 
active site of an enzyme, or in the RNA polymerase II binding region, such residues may not 
generally be exchanged. This is the case in the present invention, where residues in the active 
10 site of the kinase subunit should not generally be changed where it is the intention to maintain 
kinase function. 

Amino acid substitutions are generally based on the relative similarity of the amino acid 
side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the 

15 like. An analysis of the size, shape and type of the amino acid side-chain substituents reveals 
that arginine, lysine and histidine are all positively charged residues; that alanine, glycine and 
serine are all a similar size; and that phenylalanine, tryptophan and tyrosine all have a generally 
similar shape. Therefore, based upon these considerations, arginine, lysine and histidine; 
alanine, glycine and serine; and phenylalanine, tryptophan and tyrosine; are defined herein as 

20 biologically functional equivalents. 

To effect more quantitative changes, the hydropathic index of amino acids may be 
considered. Each amino acid has been assigned a hydropathic index on the basis of their 
hydrophobicity and charge characteristics, these are: isoleucine (+4.5); valine (+4.2); leucine 
25 (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine 
(-0.4); threonine (-0.7); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine 
(-3.2); glutamate (-3.5); glutamine (-3.5); aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and 
arginine (-4.5). 
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The importance of the hydropathic amino acid index in conferring interactive biological 
function on a protein is generally understood in the art (Kyte and Doolittle, 1982, incorporated 
herein by reference). It is known that certain amino acids may be substituted for other amino 
acids having a similar hydropathic index or score and still retain a similar biological activity. In 
5 making changes based upon the hydropathic index, the substitution of amino acids whose 
hydropathic indices are within ±2 is preferred, those which are within ±1 are particularly 
preferred, and those within ±0.5 are even more particularly preferred. 

It is also understood in the art that the substitution of like amino acids can be made 
10 effectively on the basis of hydrophilicity, particularly where the biological functional equivalent 
protein or peptide thereby created is intended for use in immunological embodiments, as in the 
present case. U.S. Patent 4,554,101, incorporated herein by reference, states that the greatest 
local average hydrophilicity of a protein, as governed by the hydrophilicity of its adjacent amino 
acids, correlates with its immunogenicity and antigenicity, i.e. with a biological property of the 
1 5 protein. 

As detailed in U.S. Patent 4,554,101, the following hydrophilicity values have been 
assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 ± 1); glutamate 
(+3.0 ± 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (-0.4); 
20 proline (-0.5 ± 1); alanine (-0.5); histidine (-0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); 
leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); tryptophan (-3.4). 

In making changes based upon similar hydrophilicity values, the substitution of amino 
acids whose hydrophilicity values are within ±2 is preferred, those which are within ±1 are 
25 particularly preferred, and those within ±0.5 are even more particularly preferred. 

While discussion has focused on functionally equivalent polypeptides arising from amino 
acid changes, it will be appreciated that these changes may be effected by alteration of the 
encoding DNA; taking into consideration also that the genetic code is degenerate and that two or 
30 more codons may code for the same amino acid. A table of amino acids and their codons is 
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presented hereinabove for use in such embodiments, as well as for other uses, such as in the 
design of probes and primers and the like. 

VI. Inhibitors and Screening Assays 
5 In still further embodiments, the present invention provides methods for identifying new 

P-TEFb inhibitory compounds, which may be termed as "candidate substances." It is 
contemplated that such screening techniques will prove useful in the general identification of any 
compound that will serve the purpose of inhibiting P-TEFb, and in preferred embodiments, will 
provide candidate anti-viral compounds. 

10 

It is further contemplated that useful compounds in this regard will in no way be limited 
to proteinaceous or peptidyl compounds. In fact, it may prove to be the case that the most useful 
pharmacological compounds for identification through application of the screening assays will be 
non-peptidyl in nature and, e.g., which will serve to inhibit viral RNA elongation through a tight 
15 binding or other chemical interaction. Candidate substances may be obtained from libraries of 
synthetic chemicals, or from natural samples, such as rain forest and marine samples. 

A. Inhibition of P-TEFb Phosphorylation 

RNA polymerase II is the natural substrate of the P-TEFb kinase and enzyme complex. 
20 Although phosphorylation of the CTD of RNA polymerase II is complicated in cellular terms, 
using phosphorylation of RNA polymerase II as an assay to identify inhibitors of P-TEFb will be 
straightforward in light of the methods disclosed herein. The inhibitors initially identified in 
such assays can be used as general transcription elongation inhibitors, or may be modified to 
prepare second generation compounds for use as specific viral transcription elongation inhibitors. 

25 

General transcription elongation inhibitors will have utility in cellular assays, and are also 
contemplated for therapeutic uses. For example, the ability to generally inhibit transcription 
elongation is envisioned to be useful in controlling excessive gene transcription/translation and 
cellular proliferation, as may be used to treat cancer and other diseases associated with aberrant 
30 gene expression and/or cellular reproduction and proliferation. It is well understood in the art 
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that many anti-cancer therapeutics, as well as anti-viral therapeutics, can exert certain effects in 
normal cell types, but such potential side-effects do not generally limit the therapeutic utility of 
such drugs. Further, the impact of any possible adverse effects can be limited or otherwise 
controlled by the more specific administration of the inhibitory agent to a disease localized site 
5 or area of the body, such as by direct application to a tumor. 

To identify a P-TEFb kinase inhibitor and potential transcription elongation inhibitor 
using a P-TEFb RNA polymerase II phosphorylation assay, one would simply conduct parallel or 
otherwise comparatively controlled phosphorylation assays and identify a compound that inhibits 
10 phosphorylation. The candidate screening assay is quite simple to set up and perform. Thus, 
after obtaining a relatively purified preparation of the kinase or intact P-TEFb enzyme, either 
from native or recombinant sources, and either from Drosophila or human sources, one will 
simply admix a candidate substance with the kinase preparation, under conditions that would 
allow the enzyme to perform its P-TEFb function but for inclusion of an inhibitory substance. 

15 

For example, one will typically desire to include within the admixture an amount of a 
RNA polymerase II, although other phosphate acceptors may be used, such as phosphorable 
peptides. Potential effectors, such as an HIV Tat protein, may also be included in the assay. In 
any event, one would measure the ability of the candidate substance to reduce phosphorylation of 
20 the RNA polymerase II or phosphate acceptor substrate relatively in the presence of the 
candidate substance. In general, one will desire to measure or otherwise determine the activity of 
the relatively purified enzyme in the absence of the added candidate substance relative to the 
activity in the presence of the candidate substance in order to assess the relative inhibitory 
capability of the candidate substance. 

25 

In terms of optimizing phosphorylation-based inhibitor assays, one should consider the 
following scientific observations. Each phosphorylation site is unique and may be influenced by 
phosphorylation of other sites. It is possible to saturate the assay with kinase such that even in 
the presence of inhibitors, all large subunit molecules are completely shifted to the Ho form. In 
30 addition, the exact relationship of the mobility of the large subunit to the number of phosphates 

-122- 

A: 123701 (2NG50II.DOC) 



added is not known. The levels of phosphorylation required for function during transcription 
should be determined and the effect that other factors might have on the kinase action in the 
elongation complex should also be considered. 

5 Therefore, the best correlation between kinase activity and function during transcription 

can be made by comparing the effects of two inhibitors under the same set of conditions. This 
does not pose a problem for conducting the P-TEFb phosphorylation-based inhibitor assays of 
the invention as DRB, and even H-8, are provided as known kinase inhibitors that also inhibit 
productive transcription elongation. Accordingly, in performing a P-TEFb phosphorylation- 
10 based inhibitor assay, one may advantageously compare the effect of a candidate inhibitor with 
the effect of DRB, or even H-8, under the same set of conditions. 

Prior to the present invention, there was continued confusion in the literature concerning 
the target for the action of the kinase inhibitor, DRB. DRB inhibits the CTD kinase activity of 

15 Drosophila P-TEFb and its ability to phosphorylate the CTD in EECs. The inventor has shown 
that under identical conditions the CTD kinase activity of P-TEFb is inhibited by DRB while that 
of TFIIH is unaffected by the drug. Yankulov et al (1996) proposed that the effect of DRB on 
transcription was due to inhibition of the kinase activity of TFIIH (Yankulov et al, 1995). The 
Yankulov conclusions were based on the observations that DRB inhibited the TFIIH kinase and 

20 the appearance of runoff transcripts during transcription with similar dose-response curves. The 
Yankulov kinase and transcription assays were performed under conditions that did not allow a 
valid comparison of inhibition curves to be made. Although recent studies strongly implicate 
TFIIH in elongation control (Yankulov et al, 1996), results with the Drosophila system indicate 
that TFIIH cannot substitute for the role of P-TEFb in controlling elongation. 

25 

The DRB-sensitive process whereby RNA polymerase II escapes abortive elongation and 
enters a productive mode of elongation is important in controlling transcription. The drugs DRB 
or H-8 inhibit the transition into productive elongation as well as the CTD kinase activity of 
P-TEFb. In addition, the present invention discloses that pure P-TEFb can phosphorylate the 
30 CTD of RNA polymerase II in early elongation complexes. Post-initiation phosphorylation of 
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the CTD by P-TEFb controls the transition into productive elongation. P-TEFb may remain 
associated with the complex or it may be released, but the complex is DRB resistant after that 
point (Kephart et al , 1 992). 

5 The present invention further discloses that the concentration of ATP used in 

phosphorylation assays has a direct effect on the concentration of inhibitor required to achieve 
50% inhibition. This should be considered in conducting phosphorylation assays, and in 
correlating kinase assays, which generally use low ATP concentrations with transcription assays, 
which have higher levels of all NTPs. Those of ordinary skill in the art will understand that the 
10 concentrations of the polymerase and kinase, the time of the reactions and other assay conditions, 
should be carefully monitored in analyzing the influence of potential inhibitors. 

B. Inhibition of P-TEFb- Viral Protein Binding 

Potential viral transcription elongation inhibitors can also be identified by conducting a 
15 P-TEFb-viral protein binding assay, such as any of the assays described herein. To identify a 
potential viral transcription elongation inhibitor using a P-TEFb-viral protein binding assay also 
simply only requires one to conduct parallel or otherwise comparatively controlled binding 
assays and to identify a compound that inhibits P-TEFb-viral protein binding. 

20 Such candidate inhibitor screening assays are also simple to set up and perform. One will 

simply admix a candidate substance with the P-TEFb and viral protein combination, under 
conditions that would normally allow the P-TEFb and viral proteins to bind but for inclusion of 
an inhibitory substance. One measures the ability of the candidate substance to reduce binding of 
the P-TEFb and viral proteins relatively in the presence of the candidate substance. In general, 

25 one again measures or otherwise determines, e.g., from a known standard curve, the binding of 
the same amount of the proteins in the absence of the added candidate substance relative to the 
binding in the presence of the candidate substance in order to assess the relative inhibitory 
capability of the candidate substance. 
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C. Inhibition of P-TEFb- Viral-Mediated Transcription Elongation 

The invention next provides methods for assaying for, or for confirming the identification 
of, candidate viral transcription inhibitors, based upon the use of viral transcription elongation 
assays. To identify a viral transcription elongation inhibitor in this manner, one would again 
5 conduct parallel or otherwise comparatively controlled transcription elongation assays and 
identify a compound that inhibits transcription elongation, and preferably, that inhibits 
transcription elongation of viral transcripts and not human transcripts. 

In these assays, one would admix a candidate substance with a transcriptionally 
10 competent composition under conditions that would normally result in the generation of 
elongated human and viral RNA transcripts but for inclusion of an inhibitory substance. One 
would then identify a positive inhibitory substance as one that prevented or significantly reduced 
the generation of elongated RNA transcripts upon addition to the assay system. Most preferably, 
one would identify a positive anti-viral inhibitory substance as one that prevented or significantly 
15 reduced the generation of elongated viral RNA transcripts, but that did not prevent or 
significantly reduce the generation of elongated human RNA transcripts, upon addition to the 
assay system. 

Assays of this type are described herein and are generally performed using effective 
20 amounts of human and viral nucleic acid templates, viral transcriptional transactivator proteins, 
P-TEFb enzyme complex, RNA polymerase II, each four nucleotides, which of course, includes 
ATP that provides the energy. Preferably, other partially purified nuclear components are 
included in the assay, as disclosed herein in the Detailed Examples* The productive viral RNA 
elongation is measured relatively in the presence of the candidate substance, generally after 
25 establishing the basal levels of productive viral RNA elongation in the absence of the candidate 
substance. In this manner, one can assess the relative inhibitory capability of the candidate 
substance. 
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D. Second Generation Inhibitors 

In addition to the inhibitory compounds initially identified, the inventor also 
contemplates that other sterically similar compounds may be formulated to mimic the key 
portions of the structure of the inhibitors. Such compounds, which may include peptidomimetics 
5 of peptide inhibitors, may be used in the same manner as the initial inhibitors. 

Certain mimetics that mimic elements of protein secondary structure are designed using 
the rationale that the peptide backbone of proteins exists chiefly to orientate amino acid side 
chains in such a way as to facilitate molecular interactions. A peptide mimetic is thus designed 
1 0 to permit molecular interactions similar to the natural molecule. 

Some successful applications of the peptide mimetic concept have focused on mimetics 
m of p-turns within proteins, which are known to be highly antigenic. Likely p-turn structure 
fi within a polypeptide can be predicted by computer-based algorithms, as discussed herein. Once 
H 15 the component amino acids of the turn are determined, mimetics can be constructed to achieve a 
ffl similar spatial orientation of the essential elements of the amino acid side chains. 

JZ The generation of further structural equivalents or mimetics may be achieved by the 

M techniques of modeling and chemical design known to those of skill in the art. The art of 
,|j20 computer-based chemical modeling is now well known. Using such methods, a chemical that 
~~ 4 specifically inhibits viral transcription elongation can be designed, and then synthesized, 
following the initial identification of a compound that inhibits RNA elongation, but that is not 
specific or sufficiently specific to inhibit viral RNA elongation in preference to human RNA 
elongation. It will be understood that all such sterically similar constructs and second generation 
25 molecules fall within the scope of the present invention. 

K Methods of Inhibiting P-TEFb 

In still further embodiments, the present invention is concerned with methods of 
inhibiting productive RNA elongation, and preferably, with preferentially inhibiting productive 
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viral RNA elongation. These methods generally comprise exposing an RNA elongation complex 
to an effective concentration of a P-TEFb or elongation inhibitor identified in accordance with 
the candidate screening assay embodiments of the present invention. Where productive viral 
RNA elongation is to be inhibited, the RNA elongation complex is a complex that comprises 
5 viral nucleic acids and transcriptional transactivator proteins, which complex is often located 
within a virally infected cell. 

This aspect of the invention is intended for use in inhibiting the P-TEFb-mediated viral 
transcription that occurs in various retroviral and other viral infections, such as HIV and HSV 
10 infections, and even in adenoviral infections. It is contemplated that the use of such inhibitors to 
block the transactivating functions of the Tat protein in infected cells of patients suffering with 
AIDS or HIV-infected states will be useful in itself and/or in conjunction with other anti-viral 
therapies. Inhibitors designed from dual viral inhibition assay will be particularly useful in the 
treatment of AIDS because the normal cellular function of P-TEFb should not be influenced. 

15 

It is also contemplated that P-TEFb will be involved in controlling cellular proliferation. 
A number of oncogenes are controlled at the elongation phase of transcription and P-TEFb is a 
key player in this control. The small subunit of human P-TEFb has been localized to a region of 
a human chromosome that contains a gene involved in cancer. The inventor has also found that 
20 P-TEFb levels correlate with the growth rate of cells. Mutations in either subunit of P-TEFb 
could have profound effects on the cellular proliferation. Gene therapy is therefore contemplated 
for treatment of conditions that result from P-TEFb mutations. Drugs that affect the cellular 
function of P-TEFb are contemplated in the treatment of certain types of cancer, irrespective of 
their specificity for inhibiting certain transcript formation. 

25 

VII. Pharmaceutical Compositions 

A. Pharmaceutically Acceptable Carriers 

Aqueous compositions of the present invention comprise an effective amount of one or 
more of the inhibitors of the invention dissolved or dispersed in a pharmaceutically acceptable 
30 carrier or aqueous medium. The phrases "pharmaceutically or pharmacologically acceptable" 
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refer to molecular entities and compositions that do not produce an adverse, allergic or other 
untoward reaction when administered to an animal, or a human, as appropriate. 

As used herein, "pharmaceutically acceptable carrier" includes any and all solvents, 
5 dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying 
agents and the like. The use of such media and agents for pharmaceutical active substances is 
well known in the art. Except insofar as any conventional media or agent is incompatible with 
the active ingredient, its use in the therapeutic compositions is contemplated. Supplementary 
active ingredients can also be incorporated into the compositions. For human administration, 
10 preparations should meet sterility, pyrogenicity, general safety and purity standards as required 
by FDA Office of Biologies standards. 

The biological material should be extensively dialyzed to remove undesired small 
molecular weight molecules and/or lyophilized for more ready formulation into a desired vehicle, 

15 where appropriate. The active compounds will then generally be formulated for parenteral 
administration, e.g., formulated for injection via the intravenous, intramuscular, sub-cutaneous, 
intralesional, or even intraperitoneal routes. The preparation of an aqueous composition that 
contains an RNA elongation transcription inhibitor agent as an active component or ingredient 
will be known to those of skill in the art in light of the present disclosure. Typically, such 

20 compositions can be prepared as injectables, either as liquid solutions or suspensions; solid forms 
suitable for using to prepare solutions or suspensions upon the addition of a liquid prior to 
injection can also be prepared; and the preparations can also be emulsified. 

The pharmaceutical forms suitable for injectable use include sterile aqueous solutions or 
25 dispersions; formulations including sesame oil, peanut oil or aqueous propylene glycol; and 
sterile powders for the extemporaneous preparation of sterile injectable solutions or dispersions. 
In all cases the form must be sterile and must be fluid to the extent that easy syringability exists. 
It must be stable under the conditions of manufacture and storage and must be preserved against 
the contaminating action of microorganisms, such as bacteria and fungi. 

30 
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Solutions of the inhibitory compounds as free base or pharmacologically acceptable salts 
can be prepared in water suitably mixed with a surfactant, such as hydroxypropylcellulose. 
Dispersions can also be prepared in glycerol, liquid polyethylene glycols, and mixtures thereof 
and in oils. Under ordinary conditions of storage and use, these preparations contain a 
5 preservative to prevent the growth of microorganisms. 

An inhibitor or antagonist of the present invention can be formulated into a composition 
in a neutral or salt form. Pharmaceutically acceptable salts, include the acid addition salts 
(formed with the free amino groups of the protein) and which are formed with inorganic acids 
10 such as, for example, hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, 
tartaric, mandelic, and the like. Salts formed with the free carboxyl groups can also be derived 
from inorganic bases such as, for example, sodium, potassium, ammonium, calcium, or ferric 
hydroxides, and such organic bases as isopropylamine, trimethylamine, histidine, procaine and 
the like. 

15 

The carrier can also be a solvent or dispersion medium containing, for example, water, 
ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyethylene glycol, and the 
like), suitable mixtures thereof, and vegetable oils. The proper fluidity can be maintained, for 
example, by the use of a coating, such as lecithin, by the maintenance of the required particle size 

20 in the case of dispersion and by the use of surfactants. The prevention of the action of 
microorganisms can be brought about by various antibacterial ad antifungal agents, for example, 
parabens, chlorobutanol, phenol, sorbic acid, thimerosal, and the like. In many cases, it will be 
preferable to include isotonic agents, for example, sugars or sodium chloride. Prolonged 
absorption of the injectable compositions can be brought about by the use in the compositions of 

25 agents delaying absorption, for example, aluminum monostearate and gelatin. 

Sterile injectable solutions are prepared by incorporating the active inhibitory compounds 
in the required amount in the appropriate solvent with various of the other ingredients 
enumerated above, as required, followed by filtered sterilization. Generally, dispersions are 
30 prepared by incorporating the various sterilized active ingredients into a sterile vehicle which 
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contains the basic dispersion medium and the required other ingredients from those enumerated 
above. In the case of sterile powders for the preparation of sterile injectable solutions, the 
preferred methods of preparation are vacuum-drying and freeze-drying techniques which yield a 
powder of the active ingredient plus any additional desired ingredient from a previously sterile- 
filtered solution thereof. 

In terms of using peptide inhibitors as active ingredients, the technology of U.S. Patents 
4,608,251; 4,601,903; 4,599,231; 4,599,230; 4,596,792; and 4,578,770, each incorporated herein 
by reference, may be used. The preparation of more, or highly, concentrated solutions for direct 
injection is also contemplated, where the use of DMSO as solvent is envisioned to result in 
extremely rapid penetration, delivering high concentrations of the active agents to a small body 
area. 

Upon formulation, solutions will be administered in a manner compatible with the dosage 
formulation and in such amount as is therapeutically effective. The formulations are easily 
administered in a variety of dosage forms, such as the type of injectable solutions described 
above, but drug release capsules and the like can also be employed. 

For parenteral administration in an aqueous solution, for example, the solution should be 
suitably buffered if necessary and the liquid diluent first rendered isotonic with sufficient saline 
or glucose. These particular aqueous solutions are especially suitable for intravenous, 
intramuscular, subcutaneous and intraperitoneal administration. In this connection, sterile 
aqueous media which can be employed will be known to those of skill in the art in light of the 
present disclosure. For example, one dosage could be dissolved in 1 ml of isotonic NaCl 
solution and either added to 1000 ml of hypodermoclysis fluid or injected at the proposed site of 
infusion, (see for example, "Remington's Pharmaceutical Sciences" 15th Edition, pages 1035- 
1038 and 1570-1580). Some variation in dosage will necessarily occur depending on the 
condition of the subject being treated. The person responsible for administration will, in any 
event, determine the appropriate dose for the individual subject. 
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The active inhibitors or agents may be formulated within a therapeutic mixture to 
comprise about 0.0001 to 1.0 milligrams, or about 0.001 to 0.1 milligrams, or about 0.1 to 1.0 or 
even about 10 milligrams per dose or so. Multiple doses can also be administered. 

In addition to the compounds formulated for parenteral administration, such as 
intravenous or intramuscular injection, other pharmaceutically acceptable forms include, e.g., 
tablets or other solids for oral administration; liposomal formulations; time release capsules; and 
any other form currently used, including cremes. 

Additional formulations which are suitable for other modes of administration include 
suppositories. For suppositories, traditional binders and carriers may include, for example, 
polyalkylene glycols or triglycerides; such suppositories may be formed from mixtures 
containing the active ingredient in the range of 0.5% to 10%, preferably l%-2%. 

Oral formulations include such normally employed excipients as, for example, 
pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, 
cellulose, magnesium carbonate and the like. These compositions take the form of solutions, 
suspensions, tablets, pills, capsules, sustained release formulations or powders. 

In certain defined embodiments, oral pharmaceutical compositions will comprise an inert 
diluent or assimilable edible carrier, or they may be enclosed in hard or soft shell gelatin capsule, 
or they may be compressed into tablets, or they may be incorporated directly with the food of the 
diet. For oral therapeutic administration, the active compounds may be incorporated with 
excipients and used in the form of ingestible tablets, buccal tables, troches, capsules, elixirs, 
suspensions, syrups, wafers, and the like. Such compositions and preparations should contain at 
least 0.1% of active compound. The percentage of the compositions and preparations may, of 
course, be varied and may conveniently be between about 2 to about 75% of the weight of the 
unit, or preferably between 25-60%. The amount of active compounds in such therapeutically 
useful compositions is such that a suitable dosage will be obtained. 
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The tablets, troches, pills, capsules and the like may also contain the following: a binder, 
as gum tragacanth, acacia, cornstarch, or gelatin; excipients, such as dicalcium phosphate; a 
disintegrating agent, such as corn starch, potato starch, alginic acid and the like; a lubricant, such 
as magnesium stearate; and a sweetening agent, such as sucrose, lactose or saccharin may be 
5 added or a flavoring agent, such as peppermint, oil of wintergreen, or cherry flavoring. When the 
dosage unit form is a capsule, it may contain, in addition to materials of the above type, a liquid 
carrier. Various other materials may be present as coatings or to otherwise modify the physical 
form of the dosage unit. For instance, tablets, pills, or capsules may be coated with shellac, sugar 
or both. A syrup of elixir may contain the active compounds sucrose as a sweetening agent 
1 0 methyl and propylparabens as preservatives, a dye and flavoring, such as cherry or orange flavor. 

It will naturally be understood that suppositories, for example, will not generally be 
contemplated for use in treating all viral infections. However, in the event that the inhibitors of 
the invention, or those identified by the screening methods of the present invention, are 
15 confirmed as being useful in connection with a particular viral infection of a given tissue or 
organ, then other routes of administration and pharmaceutical compositions will be more 
relevant. As such, inhalants, tablets, opthalmic solutions and other formulations will be 
appropriate. 

20 B. Liposomes and Nanocapsules 

In certain embodiments, the use of liposomes and/or nanoparticles is contemplated for the 
introduction of the inhibitors of the invention into host cells. The formation and use of liposomes 
is generally known to those of skill in the art, and is also described below. 

25 Nanocapsules can generally entrap compounds in a stable and reproducible way. To 

avoid side effects due to intracellular polymeric overloading, such ultrafme particles (sized 
around 0.1 jam) should be designed using polymers able to be degraded in vivo. Biodegradable 
polyalkyl-cyanoacrylate nanoparticles that meet these requirements are contemplated for use in 
the present invention, and such particles may be are easily made. 
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Liposomes are formed from phospholipids that are dispersed in an aqueous medium and 
spontaneously form multilamellar concentric bilayer vesicles (also termed multilamellar vesicles 
(MLVs). MLVs generally have diameters of from 25 nm to 4 |um. Sonication of MLVs results 
5 in the formation of small unilamellar vesicles (SUVs) with diameters in the range of 200 to 
500 A, containing an aqueous solution in the core. 

The following information may also be utilized in generating liposomal formulations. 
Phospholipids can form a variety of structures other than liposomes when dispersed in water, 

10 depending on the molar ratio of lipid to water. At low ratios the liposome is the preferred 
structure. The physical characteristics of liposomes depend on pH, ionic strength and the 
presence of divalent cations. Liposomes can show low permeability to ionic and polar 
substances, but at elevated temperatures undergo a phase transition which markedly alters their 
permeability. The phase transition involves a change from a closely packed, ordered structure, 

15 known as the gel state, to a loosely packed, less-ordered structure, known as the fluid state. This 
occurs at a characteristic phase-transition temperature and results in an increase in permeability 
to ions, sugars and drugs. 

Liposomes interact with cells via four different mechanisms: Endocytosis by phagocytic 
20 cells of the reticuloendothelial system such as macrophages and neutrophils; adsorption to the 
cell surface, either by nonspecific weak hydrophobic or electrostatic forces, or by specific 
interactions with cell-surface components; fusion with the plasma cell membrane by insertion of 
the lipid bilayer of the liposome into the plasma membrane, with simultaneous release of 
liposomal contents into the cytoplasm; and by transfer of liposomal lipids to cellular or 
25 subcellular membranes, or vice versa, without any association of the liposome contents. Varying 
the liposome formulation can alter which mechanism is operative, although more than one may 
operate at the same time. 
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C. Kits 

Therapeutic kits of the present invention are kits comprising an inhibitor of RNA or viral 
RNA productive RNA elongation. Such kits will generally contain, in suitable container means, 
a pharmaceutically acceptable formulation of an inhibitor, or even a gene expressing an inhibitor 
of a xenobiotic activator of productive RNA elongation in a pharmaceutically acceptable 
formulation, optionally comprising other anti-viral agents. The kit may have a single container 
means, or it may have distinct container means for each compound. 

When the components of the kit are provided in one or more liquid solutions, the liquid 
solution is an aqueous solution, with a sterile aqueous solution being particularly preferred. The 
inhibitors of viral and other xenobiotic activators of productive RNA elongation compositions 
may also be formulated into a syringeable composition. In which case, the container means may 
itself be a syringe, pipette, or other such like apparatus, from which the formulation may be 
applied to an infected area of the body, injected into an animal, or even applied to and mixed 
with the other components of the kit. 

However, the components of the kit may be provided as dried powder(s). When reagents 
or components are provided as a dry powder, the powder can be reconstituted by the addition of a 
suitable solvent. It is envisioned that the solvent may also be provided in another container 
means. 

The container means will generally include at least one vial, test tube, flask, bottle, 
syringe or other container means, into which the inhibitor of viral and other xenobiotic activators 
of productive RNA elongation are placed, preferably, suitably allocated. Where a second anti- 
viral therapeutic is provided, the kit will also generally contain a second vial or other container 
into which this agent may be placed. The kits may also comprise a second/third container means 
for containing a sterile, pharmaceutically acceptable buffer or other diluent. 
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The kits of the present invention will also typically include a means for containing the 
vials in close confinement for commercial sale, such as, e.g., injection or blow-molded plastic 
containers into which the desired vials are retained. 

Irrespective of the number or type of containers, the kits of the invention may also 
comprise, or be packaged with, an instrument for assisting with the injection/administration or 
placement of the ultimate inhibitor composition within the body of an animal. Such an 
instrument may be a syringe, pipette, forceps, or any such medically approved delivery vehicle. 

The following examples are included to demonstrate preferred embodiments of the 
invention. It should be appreciated by those of skill in the art that the techniques disclosed in the 
examples which follow represent techniques discovered by the inventor to function well in the 
practice of the invention, and thus can be considered to constitute preferred modes for its 
practice. However, those of skill in the art should, in light of the present disclosure, appreciate 
that many changes can be made in the specific embodiments which are disclosed and still obtain 
a like or similar result without departing from the spirit and scope of the invention. 

EXAMPLE 1 

Purification of P-TEFb, the Limiting Factor in RNA Elongation 

The reconstruction of DRB-sensitive transcription involves three chromatographically 
distinct protein fractions, P-TEFa and P-TEFb and factor 2. The present example shows the 
purification of P-TEFb to near homogeneity and demonstrates that P-TEFb is the only factor 
strictly required for the transition into productive elongation using an accepted partially 
fractionated transcription system. 

A. Materials and Methods: 
1. Materials 

32 

[a- P]CTP (3000 Ci/mmol) was obtained from ICN (Irvine, CA). Ribonucleoside 
triphosphates were obtained from Pharmacia-LKB Biotechnology (Piscataway, NJ). DRB 
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(Sigma, St. Louis, MO) was dissolved in ethanol to 20 mM and stored at -80°C. 
Phosphocellulose (P-l 1) and DEAE cellulose (DE-52) were obtained from Whatman (Hillsboro, 
OR) and prepared according to Price et al (1987). Phenyl Sepharose (Pharmacia) was prepared 
according to manufacturer's instructions. All other chemicals were reagent grade. 

5 

2. Chromatography Methods and Initial Fractionation 

All chromatography was carried out at 2-4°C. The conductivity of extracts and column 
fractions were determined with a conductivity meter. A standard curve of conductivity versus 
mM KC1 in HGKEDP (25 mM HEPES, 15% glycerol, indicated concentration of KC1, 0.1 mM 

10 EDTA, 1 mM DTT, 0.1% of a saturated solution of phenylmethylsulfonyl fluoride in 
isopropanol) was used to convert the conductivity measurements into KC1 concentration 
equivalents. Phosphocellulose was obtained from Whatman and was prepared according to the 
manufacturer's instructions except that the volumes of base and acid washes were reduced from 
25 volumes of resin to 5 volumes. Before the resin was stored the KC1 concentration was 

15 adjusted to 100 mM and the pH was adjusted to 7.6 as measured by a pH meter. 

All columns were run in HGKEDP. The K c cell extract (100 ml at 50 mg/ml protein, see 
Example 3, Methods, Section A.4 for the preparation of K c cell extracts) was loaded onto a 
500 ml phosphocellulose column at 0.25 column volumes perh. After washing the column with 

20 2 column volumes of 100 mM HGKEDP the column was eluted successively with two column 
volumes each of HGKEDP containing 0.3, 0.4 and 0.75 M HGKEDP. The 0.1 M HGKEDP 
phosphocellulose flowthrough was brought to 0.3 M and passed through DEAE-cellulose to 
remove the majority of nucleic acid content before further fractionation. The DEAE flowthrough 
was loaded onto an 8 ml FPLC Mono Q™ column (Pharmacia) and the proteins were eluted with 

25 a 20 column volume gradient from 0.1 to 0.5 M HGKEDP. 

The DNase inhibitor, an activity necessary for in vitro transcription (Biochimie 69:1199- 
1205, 1987), eluted between 0.29 and 0.31 M HGKEDP and P-TEFa eluted between 0.35 and 0.4 
M HGKEDP. The 03 M HGKEDP step from phosphocellulose was dialyzed to 0.1 M 
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HGKEDP and loaded onto a 100 ml DEAE cellulose column. A gradient elution from 0.1 to 0.5 
M HGKEDP was performed. TFIIE eluted at 0.12 to 0.15 M HGKEDP, RNA polymerase II at 
0.25 to 0.37 HGKEDP, and factor 2 at 0.18 to 0.22 M HGKEDP. The 0.4 M HGKEDP step 
from phosphocellulose was concentrated using Centricon-30™ concentrators (Amicon) and then 
5 dialyzed versus 75 mM HGKEDP for 2 h. The 0.75 M HGKEDP step from phosphocellulose 
was dialyzed to 0.15 M HGKEDP, and passed through a 100 ml DEAE-cellulose column. This 
DEAE flowthrough was loaded onto an 8 ml FPLC Mono S™ column equilibrated in 0.15 M 
HGKEDP. The Mono S™ column was subjected to a gradient elution from 0.15 to 0.5 M 
HGKEDP. P-TEFb eluted between 0.25 and 0.29 M HGKEDP. 

10 

3. Purification of P-TEFb 

The purification of P-TEFb was carried out twice from 70 to 90 ml of K C N and once from 
155 ml of P-ll 0.75 M HGKEDP step generated during the fractionation of Drosophila 
embryonic nuclear extract (obtained from W. Zehring, Wayne State University). Although the 
15 exact chromatography steps (described below) differed during the three fractionations, the 
behavior of the P-TEFb activity from both sources was nearly identical. Two schemes used to 
purify the P-TEFb used in this work are summarized below. 

Scheme 1: P-TEFb was purified from 90 ml of K^N (approximately 3.0 g protein). 

20 P-TEFb eluted between 0.55 and 0.65 M HGKEDP during gradient elution of a 500 ml P-l 1 
column from 0.15 M to 1.0 M HGKEDP. P-TEFb containing P-ll fractions (250 ml) were 
adjusted to 0.5 M HGAEDP (HGKEDP with ammonium (NH 4 ) 2 S0 4 substituted for KC1) 
followed by loading onto a 26 ml Phenyl Sepharose™ column which was eluted with a gradient 
from 0.5 M to 0 M HGAEDP. P-TEFb eluted from Phenyl Sepharose between 0.12 M and 0 M 

25 HGAEDP. P-TEFb containing fractions (23 ml) at 0.06 M HGAEDP (equivalent to 150 mM 
HGKEDP by conductivity) were pooled and allowed to flow through an 8 ml Mono Q™ 
equilibrated in 0.15 M HGKEDP directly onto a 1 ml Mono S™ column. The Mono S™ column 
was then eluted with a gradient from 0.15 M to 0.45 M HGKEDP during which P-TEFb eluted in 
2.5 ml (50 mg protein) between 0.25 M and 0.29 M HGKEDP. A 200 ul sample from Mono S™ 
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fraction 30 was loaded onto a 4.25 ml, 18%-35% glycerol gradient with a 500 ml 1M HGKE 
overlay and centrifuged at 55,000 rpm (287,000 g av ) in a Beckman SW 55 Ti rotor at 1°C for 
44 h. 

5 Scheme 2: P-TEFb was purified from 155 ml (265 mg protein) of PI 1 0.4M to 0.75 M 

step from embryonic nuclear extract. The initial P-ll step fraction was adjusted to 0.75 M 
HGAEDP before loading onto a 26 ml Phenyl Sepharose which was then gradient eluted from 
0.5 M to 0 M HGAEDP. P-TEFb activity eluted between 0.12 M and 0M HGAEDP in 17 ml 
and was then dialyzed to 175 mM HGKEDP before being passed through a 5.0 ml DE-52 

10 column. The DE-52 flowthrough (19ml, 3.4 mg protein) was loaded directly onto a 1 ml Mono 
S™ column which was gradient eluted from 175 mM to 500 mM HGKEDP. P-TEFb activity 
eluting between 0.25 M and 0.29 M HGKEDP was dialyzed and loaded onto a 1 ml Mono Q™ 
column at 50 mM HGKEDP, followed by gradient elution from 50 mM to 450 mM HGKEDP. 
P-TEFb activity was found in both the column flowthrough and early gradient fractions. Both 

15 pools of P-TEFb activity were combined and rechromatographed over a 1 ml Mono S™ column 
loaded at 75 mM and step eluted at 400 mM HGKEDP. P-TEFb eluted in two 0.2 ml fractions. 
A 125 |Lil sample from one 0.2 ml fraction was loaded onto a 5 ml, 15%-35% glycerol gradient 
and centrifuged at 55,000 rpm (287,000 g av ) in a Beckman SW 55 Ti rotor at 1°C for 40.5 h. 

20 4. In Vitro Transcription 

Two general types of transcription reactions were performed both using the actin Act5C 
template (Kephart et al 9 1992) linearized with Hpa I. Pulse-chase reactions were generally as 
described previously (Marshall and Price, 1992) and began with a 10 min preincubation (6 
ml/reaction) containing 20 mM HEPES, 5 mM MgCl 2 , 45-50 mM HGKEDP, 33 mg/ml DNA 
25 template and extract or K C -FT (as described in Example 1, Section B.4). Transcription was 
initiated by the addition of 2 ml of pulse solution which contained 5 mCi [a- 32 P]CTP and 
brought the reaction to 600 mM in GTP, ATP, and UTP. The true specific activity of the pulse is 
determined by the contamination of CTP in the other NTPs which the inventor has estimated to 
be 1 mM. The pulse was continued for 15 seconds after which the reaction was brought to 1.2 
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mM CTP by the addition of 12 ml of chase solution. The reactions were stopped by the addition 
of 200 ml of a Sarkosyl solution (1% Sarkosyl, 100 mM NaCl, 100 mM Tris pH 8, 10 mM 
EDTA and 100 mg/ml tRNA). Sample workup and analysis of labeled transcripts in denaturing 
gels was as described previously (Price et aL 9 1987). Briefly, samples were phenol extracted 
5 with an equal volume of water saturated phenol. The aqueous phase was precipitated for 10 min 
at -80°C after addition of 3 volumes of 95% ethanol containing 3 M ammonium acetate. The 
samples were spun at 15,000 rpm in a microfuge for 10 min. The supernatant was removed and 
discarded. The pellet and tube were washed with 200 ul of 70% ethanol and spun at 15,000 rpm 
for 3 min. The pellet was then dried, under vacuum, before being dissolved in gel loading buffer 
10 (0.25 x TBE, 8 M urea, with bromophenol blue and xylene cyanol). After heating the samples 
for 3-5 min at 80°C, they were loaded onto a 6% (or other concentration as desired) 
polyacrylamide gel cast in 1 x TBE, 6 M urea. After running, the gels were soaked in water 

Ej containing 1 ug/ml ethidium bromide for 15 min before being dried, under vacuum, and exposed 

^ to autoradiographic film. 

f fl5 

CO B. Results 

7 1. Requirement of the CTD for Elongation 

'Z Transcription reactions to assay P-TEF activity in partially purified fractions were 

N; performed using a continuous labeling protocol, basically as described previously (Price et al, 
Jj20 1987). These reactions contained 20 mM HEPES, 5 mM MgCl 2 , 600 mM each of GTP, ATP 
N and UTP, 30 mM CTP, 55-60 mM KC1, 3-4 mCi [a- 32 P]CTP, 5 mg/ml DNA template, and 
various protein-containing fractions in a total volume of 12.5 ml. A typical P-TEFb assay 
contained 0.2 ml of DNase inhibitor (DI), 0.2 ml of RNA Polymerase II, 0.2 ml of dTFIIE, 1.5 
ml of concentrated P11-0.4M step fraction, 0.1-0.2 ml of factor 2 and 0.5 ml of P-TEFa. A 
25 solution containing buffer, DNA, NTPs and MgCl 2 was added last to start the reactions. 
Reactions were incubated 20 min at 23°C and stopped as described above. 

Using previous work on the chromatography of initiation factors as a guide (Price et aL, 
1987), the fractionation of K C N and reconstruction of DRB-sensitive transcription was 
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accomplished as described above. Complete reconstruction after the first column using flow 
through (FT), 0.3 M, 0.4 M and 0.75 M steps gives rise to DRB-sensitive transcription. When 
these four fractions were subjected to further chromatography, a system that only gave rise to 
DRB-insensitive transcripts was generated. The reconstruction required four fractions: DNase 
5 inhibitor (D.L), TFIIE, RNA polymerase II and a crude PI 1-0.4 M step. The PI 1-0.4 M step 
fraction includes at least TFIIB and TFIIF and probably TFIID and TFIIH. 

2. DRB-sensitive Transcription Requires Three Protein Fractions 

Besides the fractions required for efficient initiation, three additional fractions are 
10 required to reconstruct efficient DRB-sensitive transcription. Two of these fractions contain P- 

TEFa and P-TEFb. The third is an activity previously described as factor 2 (Price et al 9 1987). 

Using partially purified factors, P-TEFb is the only fraction that is essentially required for 

elongation. P-TEFa and factor 2 both lead to stimulations in the level of runoff transcript. 

P-TEFb alone was able to support a very low, but detectable level of DRB-sensitive 
15 transcription. 

3. P-TEFb Has Two Subunits 

Analysis of the final two stages of the purification, chromatography on Mono S™ and 
glycerol gradient sedimentation, indicate that P-TEFb activity correlates with fractions 
20 containing two polypeptides with apparent molecular weights of 124 and 43 kDa. P-TEFb was 
purified to near homogeneity three times, twice from K C N and once from Drosophila embryonic 
nuclear extract with nearly identical results. Glycerol gradient analysis of P-TEFb purified using 
scheme 1 gave nearly identical results. Comparison of the sedimentation of known proteins to 
that of P-TEFb indicates that the factor is a heterodimer. 

25 

4. P-TEFb acts after initiation 

After purification of P-TEFb a transcription competent nuclear extract depleted of 
P-TEFb activity was generated. K C N was passed through P-l 1 at 0.4 M KC1. The resulting K c - 
FT was capable of initiating transcription as efficiently as whole K C N, but produced only 10% of 
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the DRB-sensitive transcripts (compare K C N with K<.-FT alone). Addition of P-TEFb to the K,,- 
FT restored the ability to generate DRB-sensitive runoff transcripts. The level of DRB-sensitive 
transcription achieved was dependent upon the concentration of P-TEFb added back. Even with 
the highest levels of P-TEFb added back, the ratio of DRB-sensitive to abortive transcripts 
5 remained similar to that seen previously in extract (Marshall and Price, 1 992). Importantly, since 
P-TEFb was added to the reactions during the chase, the factor must have acted after initiation on 
the early elongation complexes. 

P-TEFb is not required for initiation, does not associate strongly with preinitiation 
10 complexes (Marshall and Price, 1992) but is shown here to act during elongation. The timing of 
DRB sensitivity (Marshall and Price, 1992) and P-TEFb action coincide closely. This is 
consistent with the activity of P-TEFb being inhibited by DRB. Since DRB is canonically a 
kinase inhibitor, these data suggest that P-TEFb is a cyclin kinase. Likely targets for a 
transcription factor kinase that acts early during elongation are RNA polymerase II or a basal 
15 initiation factor such as TFIIF. The subunit composition of P-TEFb displays no obvious 
similarity to known kinases involved in transcription regulation. 

EXAMPLE 2 

Control of RNA Polymerase II Elongation Potential by P-TEFb 

20 

A. Material and Methods 
1. Materials 

32 

[a- P]CTP (3000 Ci/mmol) was obtained from ICN. Ribonucleoside triphosphates were 
obtained from Pharmacia-LKB Biotechnology. DRB (Sigma) was dissolved in ethanol to 10 
25 mM and stored at -80°C. H-8 was obtained from Seikagaku America (Ijamsville, MD) and was 
dissolved in 20 mM HEPES, pH 7.6 to 20 mM and stored at 4°C. The magnetic concentrator 
used was the MPC-E from Dynal. All other chemicals were reagent grade. 
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2. Proteolysis of RN A Polymerase II 

Partially purified RNA polymerase II was treated with 0.27 \xg/m\ chymotrypsin (Sigma) 
at 27°C in HGED (25 mM HEPES pH 7.6, 15% glycerol, 0.1 mM EDTA, 1 mM DTT) plus 110 
mM KC1 for times ranging from 0 to 20 min. Digestions were terminated by the addition of 
5 trypsin inhibitor (Sigma) to 20 jug/ml. The extent of proteolysis of the RNA polymerase II 
subunits was assayed by SDS-PAGE followed by silver staining. 

3. Proteolysis of Early Elongation Complexes 

Preinitiation complexes were formed on an immobilized actin template digested with 
10 Hpall (780 nt runoff) as described in Marshall and Price (Marshall and Price, 1992). The 
preinitiation complexes were isolated, washed once with 55 mM HKB (20 mM HEPES, 55 mM 
KC1 and 200 jug/ml BSA) and resuspended into 55 mM HKB. Transcription was initiated by the 
addition of a pulse solution, which contained 5 jxCi of [a- 32 P]CTP and brought the reaction 
mixture to 600 jiM in ATP, GTP and UTP and 2 mM MnCl 2 . MnCl 2 increased the rate of 
15 initiation of preinitiation complexes, thereby, increasing the number of polymerases in early 
elongation complexes after a short pulse. After 15 seconds the reaction was stopped by the 
addition of EDTA to 10 mM. 

These early elongation complexes were washed 3 times with 1 M HMKB (20 mM 
20 HEPES, 5 mM MgCl 2 , 1 M KC1 and 200 \M BSA) then once with 55 mM HMKB (20 mM 
HEPES, 5 mM MgCl 2 , 55 mM KC1 and 200 \xM BSA). The washed early elongation complexes 
were resuspended in 55 mM HMKB and incubated with the indicated amount of chymotrypsin 
for 10 min. Proteolysis was terminated by adding trypsin inhibitor to 0.1 mg/ml. After 
concentration, digested early elongation complexes were either resuspended into 55 mM HMKB 
25 and chased with 600 \xM of each NTP for 1 0 min in the presence or absence of K c nuclear extract 
and 0.1 jlxI P-TEFb, or analyzed by SDS-PAGE followed by immunoblotting with affinity 
purified RNA polymerase II antibody. 
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The P-TEFb used in FIG. 2A, FIG. 2B and FIG.3 was purified from Drosophila embryonic 
nuclear extract as described in Example 1 with one addition. Material eluting from the Phenyl- 
Sepharose column was loaded directly onto a 10.0 ml ceramic Hydroxy 1-apatite column (Bio- 
Rad CHT10). The column was then eluted with a linear gradient of potassium phosphate from 
5 10 mM to 750 mM in 25 mM HEPES, pH 7.6, 15% glycerol. P-TEFb eluted between 400 mM 
and 500 mM phosphate. Pooled fractions containing P-TEFb were then dialyzed and 
chromatographed on Mono S. P-TEFb eluting from Mono S™ still had significant nucleic acid 
contamination so the material was subjected to chromatography on Mono Q™ followed by Mono 
S™ for re-concentration. The peak fraction from the final Mono S™ column contained about 0.5 
10 mg/ml P-TEFb. 

7. CTD Kinase Assay 

0. 4 \i\ of purified RNA polymerase II and various protein samples were mixed in 18 (4,1 of 
55 mM HMK. The reaction was then initiated by the addition of 2 jil of a solution containing 2 

15 jxCi of [y- 32 P]ATP (ICN) and unlabeled ATP at 10 |iM, or only unlabeled ATP at concentrations 
from 1 to 100 p,M or other NTPs or dNTPs at concentrations of 1 to 100 |LiM. Reactions were 
incubated for the indicated times at 23°C and then terminated with SDS loading buffer. Samples 
were analyzed on a 6-15% SDS polyacrylamide gel which was silver stained, dried and subjected 
to autoradiography if the assay contained label. 

20 

B. Results 

1. Productive Elongation Requires the CTD 

Drosophila RNA polymerase II was treated with chymotrypsin for increasing times to 
gradually remove the CTD (FIG. 1). Trypsin inhibitor was added to aliquots of the digestion 
25 reaction after 0, 2, 8, or 20 min. When trypsin inhibitor was added to a similar reaction before 
the chymotrypsin, no digestion took place during a subsequent 20 min incubation indicating that 
the protease was inactivated. Intact or truncated forms of the polymerase were then used to drive 
transcription from the Act5C promoter using fractions derived from Drosophila ¥^ cell nuclear 
extract containing factors needed for initiation and productive elongation. 
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Most of the RNA polymerase II was removed by the fractionation procedure; however, 
some runoff transcripts were detected in the absence of added RNA polymerase II. When intact 
RNA polymerase II was added the runoff signal increased dramatically and was sensitive to 
5 DRB. The amount of DRB-sensitive, runoff transcript decreased as the CTD was removed. The 
amount of shorter, DRB-insensitive, abortive transcripts increased with added RNA polymerase 
II but did not change in amount as the CTD was removed. Quantitation of the autoradiograph 
and protein gel showed that the level of runoff transcript was directly related to the amount of 
polymerase containing the CTD while the generation of abortive transcripts was unaffected by 
10 loss of the CTD (FIG. 1). 

Earlier findings, using a minimal set of Drosophila fractions required for initiation, 
indicated there was no effect when substituting CTD-less polymerase for intact polymerase 
(Zehring et al, 1988). These earlier results were obtained using fractions that did not contain 
15 factors needed for the generation of DRB-sensitive productive elongation complexes (Marshall 
and Price, 1995). Therefore, the negative effect of removal of the CTD indicates that the CTD is 
involved in the transition into productive elongation. 

To address the possibility that truncation of the CTD in the previous study had an effect 
20 on initiation that resulted in the formation of exclusively DRB-insensitive complexes, a protocol 
for the truncation of the CTD after initiation was developed. Early elongation complexes were 
formed on an immobilized template and then washed with buffer containing 1 M KC1 which 
removes uninitiated RNA polymerase II. The proteins found in early elongation complexes were 
stripped from the beads with SDS and analyzed by SDS-PAGE followed by western blotting. 
25 Rabbit anti-Drosophila large subunit antibodies to a non-CTD containing domain of the large 
subunit of RNA polymerase II were generated (see above in Methods, Section 4) and used to 
probe the western blot. 

Treatment with increasing amounts of chymotrypsin resulted in removal of the CTD from 
30 RNA polymerase II in early elongation complexes. Essentially complete truncation occurred 



A: 12370I(2NG501!.DOC) 



-145- 



when 0.2 ng/ml or more of chymotrypsin was used. The protease treatment did not negatively 
affect the ability of the isolated early elongation complexes to continue elongation during the 
subsequent chase. Except for the increase in the longest transcripts at the higher protease 
concentrations, the typical pattern of transcripts seen during abortive elongation on the actin 
5 template (Marshall and Price, 1992; Xie and Price, 1996) were detected. The increase in the 
longest transcripts is due to the removal of the remaining trace amount of factor 2 (Price et al, 
1987) which normally exerts a negative effect on elongation (Xie and Price, 1996). 

To assess the ability of the polymerase in early elongation complexes to enter productive 
10 elongation, nuclear extract was added back with the chase. In this study the extracts were 
complemented with a constant amount of additional P-TEFb which is normally limiting in the 
extracts. DRB-sensitive runoff transcripts due to P-TEF action were visible in the western blot 
lanes using non-proteolyses complexes. As increasing amounts of chymotrypsin were used, the 
early elongation complexes lost the ability to form long DRB-sensitive transcripts. The long 
15 transcripts seen at high protease levels in the early elongation complexes alone were not seen 
when EySf was added because of the effect of endogenous factor 2 in the extract (Xie and Price, 
1996). These results show that the CTD is required for the generation of long DRB-sensitive 
transcripts. 

20 2. P-TEFb is a CTD Kinase 

The requirement of the CTD for the generation of DRB-sensitive long transcripts and the 
inhibition of the process by the kinase inhibitor DRB (Marshall and Price, 1992; Marshall and 
Price, 1995) prompted studies to determine if P-TEFb was a CTD kinase. Studies similar to 
those in Example 1, and prepared in the same manner, showed that incubation of P-TEFb with 
25 intact RNA polymerase II caused the incorporation of phosphate into the large subunit and a shift 
to the IIo form. 

Truncation of the CTD by chymotrypsin resulted in a loss of the ability of P-TEFb to 
phosphorylate the polymerase. The polymerase that was treated with chymotrypsin for 0, 2, 8, or 
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20 min followed by addition of trypsin inhibitor, see Section B.l, was used as a substrate for 
pure Drosophila P-TEFb. Using the incorporation of label into the large subunit of RNA 
polymerase II as an assay (see Methods, Section A.7) the ability of P-TEFb to label the large 
subunit was directly related to the amount of the CTD remaining on the subunit. The intact 
5 polymerase was labeled very efficiently, while the CTD-less polymerase (20 min digestion) was 
not detectably phosphorylated. Intermediate truncations gave rise to intermediate levels of 
phosphorylation. 

To correlate the CTD kinase activity with P-TEFb function in transcription, fractions 
10 from a gradient elution of P-TEFb from a Mono S™ column and a subsequent glycerol gradient 
were tested in both assays. P-TEFb was about 25% pure in the peak fraction (Peterson et aL, 
1992) from Mono S™ (Example 1 ; Marshall and Price, 1995). CTD kinase activity coeluted with 
y the transcription activity of P-TEFb on Mono S™ . Fractions from the glycerol gradient analysis 
;t; of P-TEFb from Mono S™ fraction 30 were analyzed for transcription and CTD kinase activities 
j^l5 on a silver-stained 6-15% SDS-polyacrylamide gel (SDS-PAGE). Both activities again co- 
ff migrated with each other and with the 124 and 43 kDa subunits of P-TEFb previously identified 
" (Example 1; Marshall and Price, 1995). P-TEFb subunits, transcription function and CTD kinase 
Li activity correlated across all columns assayed. 

J20 3. Characterization of P-TEFb CTD Kinase 

To begin to establish the basic parameters of the CTD kinase activity of P-TEFb, P-TEFb 
was incubated for 5 min with RNA polymerase II and various cold nucleotides. The products 
were subjected to SDS-PAGE and silver stained to visualize the RNA polymerase EL The 
mobility shift from the unphosphorylated Ha form to the highly phosphorylated Ho form of the 
25 large subunit was used to ascertain kinase activity. Partial shifts indicated lower levels of 
incorporation of phosphate. P-TEFb was capable of utilizing the purine nucleotides in rough 
order of efficiency ATP » dATP = GTP > dGTP. ATP was used 10-100 times more efficiently 
than dATP. GTP was used ~2-fold more efficiently than dGTP. P-TEFb was unable to utilize 
any of the pyrimidine nucleotides at 100 uM. In a similar study it was found that even at high 
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concentration, i.e. 600 \M CTP, 600 \xM UTP, 600 \xM dCTP, or 600 \M dTTP, other 
nucleotides were not used as substrates for P-TEFb in the phosphorylation of RNA polymerase 
II. 

5 4. Kinetics of Phosphorylation 

The kinetics of phosphorylation of the CTD were examined with three concentrations of 
P-TEFb and 10 |liM ATP. At all concentrations of P-TEFb intermediates between Ha and IIo 
were seen at early time points indicating a progressive increase in phosphorylation. At the 
lowest concentration of P-TEFb -50% of the large subunit molecules were progressively 
10 phosphorylated, even though the molar ratio of P-TEFb to RNA polymerase II was about 1 :50. 
This result indicates that P-TEFb was not stably associated with the polymerase during the 
addition of all phosphates and, therefore, was not completely processive. As the concentration of 

g P-TEFb was increased the fraction of the large subunit that was shifted to IIo increased to about 

S 90%. 

Ml 5 

m Comparison of the kinetics of the appearance of the intermediate forms indicates that 

" increasing the P-TEFb concentration had a modest but non-linear effect on the rate of the 
jf phosphorylation. For example, comparing 1 min with 0.01 ^1 of P-TEFb to 30 seconds with 0.05 
M= 1^1 indicates that although less large subunit was affected, an equivalent mobility shift was 
,f~20 obtained. At all P-TEFb concentrations tested there were at least some polymerase molecules 
that received no or very few phosphates while others were heavily phosphorylated to the IIo 
form. These data are consistent with a slow initial phosphorylation event(s) followed by more 
rapid subsequent phosphate incorporation. The different rates could be due to conformational 
changes in the polymerase or CTD required for initial kinase action or to increased reactivity of 
25 partially phosphorylated CTD. 

5* Sensitivity of the Kinase to Inhibitors 

The sensitivity of the kinase to both DRB and H-8 under various conditions was 
observed. At 10 (iM ATP (the same concentration of ATP used in the labeling assay), P-TEFb 
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was strongly inhibited by DRB at 10 jliM. Under the same conditions, approximately 10 times as 
much H-8 was required to inhibit the kinase to the same level This 10-fold difference was seen 
consistently under all conditions tested. At 100 |iM ATP both drugs were less effective. The 
inhibition by DRB and H-8 was directly related to the concentration of nucleotides used. 
5 Because the inhibition by DRB and H-8 was affected by nucleotide levels the relative inhibition 
between DRB and H-8 can be utilized as a more useful way to compare P-TEFb with other 
kinases. 

Since the CTD kinase activity of P-TEFb was sensitive to H-8, the effect of the drug 
10 during transcription in nuclear extracts was examined. Increasing amounts of DRB or H-8 were 
included in a continuous labeling assay. As seen earlier runoff transcription was very sensitive to 
DRB with a 50% inhibition point of 0.7 ^iM (FIG. 2A). Under identical transcription conditions, 
5j H-8 had a 50% inhibition point of 7 (FIG. 2B). This 10-fold difference is the same as that 
;^ seen with the CTD kinase assay. As expected for an inhibitor of productive elongation, initiation 
H 15 and abortive transcription were unaffected by H-8 even at concentrations that severely inhibited 
m the appearance of runoff These data show that P-TEFb is the target of these kinase inhibitors 
7" during transcription. 

6. P-TEFb Phosphoiylates the CTD of RNA Polymerase II in Early Elongation 

yrj20 The data herein show that an intact CTD is required for productive elongation and that 

P-TEFb can phosphorylate pure RNA polymerase II. Since P-TEFb acts during elongation 
(Marshall and Price, 1995), it should phosphorylate the CTD if the polymerase is in an early 
elongation complex. To determine if the polymerase is in an early elongation complex western 
blot analysis was used to determine the phosphorylation state of the polymerase in isolated 
25 transcription complexes. 

Preinitiation complexes were formed on an immobilized actin template as described 
above in Methods, Section 3. When these complexes were washed with low salt buffer they 
remained intact and western blot analysis indicated that the large subunit of RNA polymerase II 
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was hypophosphorylated. Incubation of the preinitiation complexes with ATP caused a 
significant shift of the polymerase to the Ho form. The antibodies used to detect the large 
subunit of RNA polymerase II reacted more strongly with the Ha form of the large subunit than 
either the phosphorylated Ho form or the truncated lib form. This phosphorylation was 
5 unaffected by the presence of 20 |uM DRB. 

This result is consistent with the previous data that P-TEFb is not associated with 
preinitiation complexes (Marshall and Price, 1992; Marshall and Price, 1995; see Example 1). 
Essentially all of the RNA polymerase II was removed by washing with buffer containing 1 M 

10 KC1. When the preinitiation complexes were incubated under transcription conditions for a short 
time, a portion of the polymerases initiated and, therefore, remained associated with the template 
during the high salt wash. The CTD kinase found in the preinitiation complex was no longer 
associated with the polymerase because incubation with ATP had no effect. However when 
increasing amounts of P-TEFb were incubated with the high salt washed early elongation 

15 complexes, the Ila form of the polymerase decreased and the IIo form increased. The ability of 
P-TEFb to phosphorylate RNA polymerase II in an early elongation complex was inhibited by 20 
fiM DRB. 

7. TFIIH Does Not Functionally Substitute for P-TEFb 

20 To determine whether P-TEFb was related to the protein kinase associated with TFIIH 

(Roy et al 9 1994; Feaver et al. 9 1994; Cismowski et al, 1995; Serizawa et a!., 1995; Shiekhattar 
et al 9 1995), P-TEFb was compared with Drosophila TFIIH purified from Drosophila embryos 
by the method of Austin and Biggin (1996). 

25 The two proteins did not have any subunits in common when analyzed by SDS-PAGE 

and silver staining. The large subunit of P-TEFb ran as a doublet which may be due to 
phosphorylation, since autophosphorylation of both P-TEFb subunits had been previously 
observed. 
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To compare the activities of the two factors, both were tested in the CTD kinase assay 
under identical conditions. Using equal concentrations of the two kinases, as determined by 
relative protein staining, both were able to incorporate similar amounts of 32 P0 4 into the CTD of 
RNA polymerase II and cause the shift to the Ho form. The CTD kinase activity of P-TEFb was 
sensitive to 20 jiM DRB while that of TFIIH was unaffected. Since DRB inhibition could be 
affected by enzyme levels titration studies were used to determine that the levels of P-TEFb and 
TFIIH were not saturating. To determine if TFIIH could functionally replace P-TEFb similar 
amounts of both proteins were tested in the P-TEFb -dependent K C -FT (Marshall and Price, 1995; 
Example 1) transcription assay. As expected, addition of increasing amounts of P-TEFb led to 
an increase in DRB-sensitive runoff transcripts (FIG. 3). In contrast, addition of even the highest 
levels of TFIIH had little effect on the amount of runoff even though the added TFIIH had high 
CTD kinase activity when assayed using purified RNA polymerase II. Therefore, it appears that 
TFIIH cannot substitute for P-TEFb during the transition into productive elongation. 

The 10-fold difference between levels of DRB and H-8 required to achieve equal 
inhibition of CTD phosphorylation remained constant under all conditions examined. Since a 
10-fold difference in the levels of DRB and H-8 was required to inhibit transcription, this 
indicates that the effect of the drugs in transcription is due to inhibition of the CTD kinase 
activity of P-TEFb. 

After the initial transition into productive elongation has been passed, it may be necessary 
to maintain the highly phosphorylated state of the polymerase. However, the results of Egyhazi 
et al. (1996) and Kephart et al (1992) suggest that if CTD kinases are needed for maintenance of 
the Ho form during elongation then these kinases are not sensitive to DRB. Therefore, P-TEFb 
does not appear to be involved in maintenance. Maintenance could be accomplished by the 
inhibition of CTD phosphatase (Chambers et al, 1995) activity. 
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EXAMPLE 3 

Drosophila P-TEFb Increases the Processivity of Human RNA Polymerase II 



A. Materials and Methods 

1. Materials 

Ribonucleoside triphosphates were obtained from Pharmacia LKB Biotechnology Inc. 
[a- P]CTP (3000 Ci/mmol) was purchased from ICN pharmaceuticals. Streptavidin-coated 
paramagnetic beads (Dynabead™ M280) were obtained from Dynal (Great Neck, N.Y.). All 
other reagents were as described previously (Marshall and Price, 1992). 

2. Growth mediums 



D22 Medium for Growth of K c cells (for 50 L): 



A: MgCl 2 • 6H 2 0 

MgS04 (Anhydr.) 

CaCl 2 (Anhydr.) 

L-glutamic acid 
Glycine 



50g 

82g 

40.5g 

530g 
270g 



B: Potassium hydroxide (45%) 
Sodium hydroxide (50%) 



152.5ml 
180.5ml 



C: L-malicacid 
Succinic acid 
Glucose 
Sodium acetate 
Lactalbumin hydrolysate 
Yeastolate 



30.5g 
2.75g 
91g 
0.7g 

682g 
68g 



D: Grace's vitamins, 100ml 

E: Potassium hydroxide (45%) to pH 6.7 

F: NaH 2 P0 4 • H 2 0 (100X) pH 6.7, 500ml 



(100X = 3.8g/100ml) 
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The ingredients are added in sequential order (A, then B, then C, then D, then E, then F) and 
brought to a final volume of 50 L. The medium is filter sterilized by starting with a large-pore 
filter and ending with a 0.22mm final filter. 



5 Grace's Vitamins (modified) for 1 liter: 



chemical mg/liter 

Biotin 50 

D-Ca Pantothenate 10 

10 Choline Chloride 1000 

Folic Acid 10 

i-Inositol 100 

Niacin 100 

para-aminobenzoic acid 100 

15 pyridoxine HC1 10 

Riboflavin 10 

Thiamine HC1 10 



Add 45% KOH until all material is dissolved. Filter or store at -20°C. 

20 

3. Solutions 



25 



HGE 



4 liters 



25 mM HEPES 23.8 g 
15% glycerol 600 ml 

0.1 mM EDTA 0.8 ml (0.5 M) 
(adjust pH to 7.6 with 50% NaOH) 



1 M HGKE 



25 mM HEPES 
15% glycerol 
0.1 mMEDTA 
1 MKC1 



2 liters 



11.9g 
300 ml 

0.4 ml (0.5 M) 
149.1 g 



(adjust pH to 7.6 with 50% NaOH) 



30 



35 



Buffer A 



15mMKCl 
10 mM HEPES 
2 mM MgCl 2 
0.1 mMEDTA 



500 ml 



7.5 ml 1 M 
10 ml (0.5 M) 
1 ml (1 M) 
0.1 ml (0.5 M) 



Buffer B 



1 MKC1 
50 mM HEPES 
30 mM MgCl 2 
0.1 mMEDTA 



200 ml 



14.9 g 

20 ml (0.5 M) 
6 ml (1 M) 
0.04 ml (0.5 M) 



Stock solutions: 0.5 M HEPES, pH 7.8; 0.5 M, EDTA, pH 8.3; 1 M MgCl 2 . 
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Complete the above buffers just before use with: 1 mM DTT from 1 M stock (stored at -80°C) 
and 1 : 1 000 dilution of isopropanol saturated with PMSF at room temperature. 

Stock solution: 4M (NH 4 )S0 4 , pH 7.9 

5 

All buffers are filter sterilized. All glassware is autoclaved. 

4. DNA templates 

The Drosophila actin A2 (Rl/Pstl) DNA template has been described (Marshall and 
10 Price, 1992). Transcription of this template after digestion with Hpal yields a 520-nucleotide 
(nt) runoff transcript. The HIV-1 template (pLTR-4/CAT) derived from HIV-SF2, was obtained 
from P. Luciw (Sanchez-Pescador et al 9 1985). It contains the HIV-1 LTR from -153 to +80 
and has the CAT gene downstream of the LTR. Transcription of this construct after digestion 
with Ncol produces a 633 nucleotide runoff transcript. Transcription of the HIV-1 construct after 
15 digestion with BamHI produces a 1640 nucleotide runoff and digestion with Xbal produces a 
4000 nucleotide runoff transcript. Biotinylated Drosophila A2 DNA templates were made as 
previously described (Marshall and Price, 1992). 

5. K c Cells and Cell Extracts 

20 Growth of K c cells and preparation of nuclear extracts are as described (Price et al 9 

1987). 

a. freezing and thawing cells: 

Cells are grown in spinner culture to 4 x 10 6 cells/ml then gently spun in 50 ml conical 
tubes in a JS-4.2 rotor (Beckman) at 500 rpm/10 min/4°C. Spun cells are resuspended in 4.5 ml 
25 ice-cold fetal calf serum and 0.5 ml DMSO or 1/10 original volume. Aliquots of 1 ml each are 
placed into cryotubes while on ice, frozen at -20°C for 1 h and then transferred to -80°C 
overnight. Cells can be stored for months at -80°C or longer in liquid N2 without impairment to 
function. 
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Frozen cells are thawed at room temperature and diluted 10-20 fold with D22 media in a 
25 ml tissue culture flask. The D22 media is removed and replaced with 6-fold fresh D22 media 
when cells have attached to the flask (approximately 30 min). Fresh D22 media is added over 
the next two days until the cells reach the desired density and can be split. 

5 

b. continuation of line: 

The cells should be continued in flasks with care being taken never to allow them to 
become overgrown, as evidenced by the release of all cells from the bottom of the flask with only 
gentle tilting. The cells will tolerate lower densities in flasks due to the close proximity of other 
10 settled cells, but the cultures should be split at least every 3.5 days or one day before cells 
become detached by tilting. 

S c. growth for production of extracts: 

fi Careful control of the conditions of growth of K c cells is critical for the production of 

[=15 high quality extracts for in vitro transcription. The cells must be maintained between 1.5 x 10 6 
^ and 8 x 10 6 cells/ml in spinner cultures, preferably between 2 x 10 6 and 6 x 10 6 cells/ml. Cells 

will double every 16 to 18 h when growing well. Cells that are diluted too much will not grow, 
q and cells that reach densities of over 8 x 10 6 cells/ml will not yield good extracts even if their 
; |:; growth rate is good after dilution and further growth. Extracts should be made when cells are 
J20 between 3 x 10 6 and 5 x 10 6 cells/ml Most of the cells in a good culture will be attached to 

another cell. However, the stirring rate should be increased if many cells are attached in chains 

of four or longer. 

d. Preparation of K c Cell Nuclear Extract, K C N: 

25 This protocol details the production of a nuclear extract from Drosophila K c cells. It has 

been described in Price, et al (1987). A standard preparation starts with 18 liters of cells, but the 
protocol can be scaled down or up as desired. Provisions are given to save the cytoplasmic 
portion of the cells which may contain other cellular factors of interest. Phenol extraction of the 
cytoplasm will produce massive quantities of high quality K c cell cytoplasmic RNA. This 



A: 123701(2NG501! DOC) 



-155- 



procedure produces extracts that are more active than any other extracts reported. The procedure 
can also be used on other types of cells including mammalian cells. cells are grown in D22 

media to a density of 3 to 5 x 10 6 cells/ml. Prepare, at 4°C, solutions A, B and A + 1/15B. Add 
DTT and PMSF just prior to use. 

5 

e. Cell harvesting: 

K c cells are aliquoted into centrifuge tubes, about 65 mis per bottle, and spun in a 
Beckman JS-4.2 rotor at 4.2K rpm (4000 x g) for 7 min. The supernatant is removed and the 
pelleted cells are retained. 10 ml Buffer A+l/15 B is added to each centrifuge bottle and the 
1 0 cells are resuspended and transferred to four 40 ml polycarbonate tubes. A 1 0 ml wash of Buffer 
A+l/15 B is used to wash all of the bottles and the wash is added to the appropriate 
polycarbonate tube. The cells are spun in the Beckman JS-13.1 swinging bucket rotor for 5 min 
P at 4,000 rpm (4000 x g). 

■~ 15 The supernatant is discarded and the pelleted cells are gently resuspended the cells in 

«f Buffer A (10 ml/tube). Resuspended cells are spun a JS-13.1 rotor for 5 min at 5,000 rpm. The 
i slightly cloudy supernatant is discarded. The pelleted cells are resuspended in Buffer A to 40 ml 

b total volume » transferred to a 40 ml Dounce homogenizer and homogenized until cell lysis is 
greater than 90%. Cell lysis is normally 95%-98%. 3 mis of Buffer B is added and lyzed cells 
€20 are briefly homogenized again. Lyzed cells are transferred to a polycarbonate tube. 

The lyzed cells are spun in a JS-13.1 rotor for 8 min at 8,000 rpm to pellet nuclei. The 
supernatant (K C C) is transferred to disposable 50 ml conical tube(s) and placed on ice. 10 mis of 
Buffer A is added to each tube and nuclei (K<.N) are gently mixed until completely resuspended. 
25 The volume of the resuspended nuclei is brought to total volume of 40 ml with Buffer A. The 
nuclei are uniformly resuspended by gentle homogenization. 

The nuclear suspension is transferred to 28 ml Oak Ridge Tubes, 20 ml/tube, 2 ml of 4 M 
(NH4)SC>4 is added to each tube (360 mM (NH4)SC>4 final) and mixed for 30minutes at 4°C. 
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The resulting suspension is transferred to fresh Oak Ridge tubes. Both the nuclear suspension 
and the saved supernatant are spun in a Beckman 55.2Ti rotor for 45 min at 45,000 rpm (150,000 
x g) to pellet the chromatin. 

5 The resulting supernatant from the nuclear suspension is transferred into a fresh Oak 

Ridge tube containing 5.5g solid (NH4)S04 (0.25g/ml) and mixed for 20 min at 4°C. The spun 
IQC supernatant is frozen at -80°C. The nuclear suspension is precipitated with ammonium 
sulfate in the Beckman 55.2Ti rotor for 15 min at 45,000 rpm and the supernatant is removed. 

10 f. Pellet resuspension and dialysis: 

The protein pellet is dissolved in 1/2 of 1 nuclear volume of HGEDP (HGE + DTT and 
PMSF) and transferred to a 7 ml Dounce homogenizer and homogenize gently until the protein is 
03 completely dissolved. 25 ml HGKE is added to 475 ml HGEDP to make 500 ml HGKEDP. the 
7% solubilized protein is dialyzed against 500 ml of 50 mM HGKEDP for 2.5-3.0 h. at 4°C. The 
J™ 15 final salt concentration should be 120-150 mM KC1. Dialized protein is aliquoted and frozen at 

m -8o°c. 

jl 6. Preparation of HeLa Cell Nuclear Extract 

The growth of HeLa cells and preparation of nuclear extracts (NE's) are essentially as 
gg.20 described by Dignam et al (1983) with some modification as described below. P-TEFb was 
purified from Drosophila embryonic nuclear extract as previously described in Example 1. This 
protocol details the production of a nuclear extract from mammalian cervical cancer, HeLa, cells. 
It is very similar to the K c cell extract described above in Section A.4 and by Price, et al (1987). 
Modifications to the cell harvest are described below. All other methods for preparation of HeLa 
25 cells are described in Section A.4 above. 

a. cell harvest: 

HeLa cells are grown on plates or in spinner flasks (5 x 10 6 - 7.5 x 10 6 cells/ml) in 
DMEM/10% Calf Serum. The HeLa cells are either scraped off of T-150 Plates or cells grown in 
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D22 media are spun down in a Beckman JS-4.2 rotor at 4.2K rpm (4000 x g) for 5 min and the 
pelleted cells are recovered. Cells are resuspended as before in Buffer A+ 1/15 B except that 
only V 2 of the volume indicated in Section A.4 is used. Volumes of all following isolation and 
purification steps are modified accordingly. 

5 

7. Transcription Reactions 

A pulse-chase protocol was used in which the template DNA was preincubated with the 
extract, 20 mM HEPES, pH 7.6 and 10 mM MgCl 2 for 20 min at 30°C. Nucleotides, including 

32 

[a- P]CTP were added to start the pulse; 2 min later, excess cold CTP was added to initiate the 
10 chase. Reactions were stopped after the indicated chase times. During the pulse, 10 (il reaction 
mixtures contained the following: 20 mM HEPES (pH 7.6), 10 mM MgCl 2 , 600 jaM each GTP, 
UTP and ATP, 5 juCi of [a- 32 P]CTP (~1 CTP), 66 mM KC1, 10-40 ng/ml DNA template, 
m and 3 |ul of HeLa nuclear extract. For the chase, unlabeled CTP was added to bring the total 
fi concentration of CTP to 1.2 mM and the final reaction volume to 12 \xl. Reaction mixtures 
M 1 5 containing 250 mM KC1, were supplemented with KC1 at the beginning of the chase. Drosophila 
m P-TEFb was added to the preincubation mixture where indicated. Reactions were stopped by 
^ adding 200 (al of stop solution (1% Sarkosyl, 50 mM Tris, pH 8.0, 50 mM EDTA, 100 mM NaCl 
Ji; and 100 (ig/ml tRNA). The reaction mixtures were phenol extracted and the nucleic acids were 
H: ethanol precipitated, washed with 70% ethanol, dried and analyzed by gel electrophoresis. 
J ; 20 Transcription reactions with K c cell NE were as described (Kephart et al , 1992). 

8. Gel Electrophoresis 

For polyacrylamide gel electrophoresis, samples were resuspended in 7 ml of 0.25X TBE 
(IX TBE contains 89 mM Tris base, 89 mM Boric acid and 2 mM EDTA) and 8 M urea, heated 
25 for 3 min at 85°C and analyzed in 6% acrylamide-8 M urea- IX TBE gels. Denaturing agarose 
gel electrophoresis was performed by resuspending samples in 50% formamide-2.2 M 
formaldehyde- IX morpholinepropanesulfonic acid (MOPS) buffer (20 mM MOPS, pH 7.0; 5 
mM sodium acetate, 1 mM EDTA) and heating the mixture at 60°C for 5 min. Samples were 
resolved in a 2% agarose gel containing 2.2 M formaldehyde and IX MOPS buffer. 
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B. Results 

Previous results using Drosophila DNA templates and K c cell nuclear extract (NE) have 
shown that two classes of RNA polymerase II elongation complexes are present during 
5 transcription, in vitro (Marshall and Price, 1992). These are either elongation complexes that 
produce short transcripts or productive elongation complexes capable of making long runoff 
transcripts. P-TEFb, a factor required for the transition into productive elongation (Marshall and 
Price, 1995), functions via its ability to phosphorylate the CTD of the large subunit of RNA 
polymerase II in early elongation. Both the function of P-TEFb and its CTD kinase activity are 
10 inhibited by low levels of DRB (see Example 2). The present invention demonstrates that a 
human P-TEFb homolog is involved in controlling elongation in the HeLa transcription system. 



1. DRB-Sensitivity of Human RNA Polymerase II Elongation Complexes 

To begin to examine elongation control in a HeLa in vitro transcription system, the 
general properties of human elongation complexes were studied using conditions similar to those 
used to study Drosophila complexes in Example 2. 



Using a HeLa NE pulse-chase reactions were performed at either 10 or 40 (ig/ml template 
with or without DRB. After a 2 min pulse under limiting CTP conditions the incomplete 

20 transcripts had a distinct pattern. More early elongation complexes (EECs) were formed at the 
higher template concentration as indicated by the increase in intensity of the transcripts after the 
pulse. Addition of 50 ^xM DRB inhibited the formation of the longest transcripts during the 
pulse. When the EECs formed during the pulse were chased for the indicated times, a very 
processive, DRB-sensitive elongation complex was identified. Even though more polymerase 

25 molecules initiated at the higher template concentration, a similar number ultimately reached 
runoff and the runoff accumulated more quickly at the lower template concentration. 

These results indicate that, as has been seen in a Drosophila transcription system, P-TEF 
was limiting for the generation of DRB-sensitive runoff transcripts (Marshall and Price, 1992). 
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A difference between the properties of the elongation complexes formed in the HeLa system and 
the Drosophila system was that at long time points only 60% (instead of greater than 95%) of the 
runoff generated was DRB-sensitive. 

5 To determine if the DRB-insensitive elongation complexes that were capable of making 

long transcripts arose from incomplete inhibition by DRB, DRB was titrated from 10 to 135 yM. 
Maximum inhibition by DRB occurred at 30 |uM. Increasing the DRB concentration to 135 jiM 
did not inhibit the formation of the DRB-insensitive processive elongation complexes. 
Therefore, 50 |^M DRB was used in all subsequent studies. These results show that the HeLa 
10 transcription system gave rise to both types of elongation complexes found in the Drosophila 
system and in addition produced a class of DRB-insensitive, processive elongation complexes 
not seen in the Drosophila system. 

fd It was possible that the apparent difference between the two systems was merely due to 

1 5 properties of HeLa RNA polymerase II and that the DRB-insensitive transcripts generated by the 

; yr human polymerase were just longer than those seen in the Drosophila system. If this were so, 

* " then longer transcripts would be more sensitive to DRB than shorter ones. 

H To determine if this were the case, templates that produced longer runoff transcripts were 

J 20 used. Two min pulse-chase time course assays using DNA templates that generate 1640 or 4000 
nucleotide runoff were carried out. At longer time points both DRB-sensitive and DRB- 
insensitive processive elongation complexes generated 1640 and 4000 nucleotide runoff 
transcripts. Addition of 250 mM KC1 to the chase caused the EEC's to elongate at a slower rate, 
compared to 60 mM KC1 in a normal chase. 

25 

As was found in the Drosophila system (Kephart et aL 9 1992; Marshall and Price, 1992) 
high salt suppressed both positive and negative elongation factors. The rate of elongation was 
slowed when compared to that seen for productive elongation complexes under normal salt 
conditions. As in the Drosophila system (Kephart et al 9 1992; Marshall and Price, 1992), early 
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blocks to elongation (promoter proximal pausing) are apparently removed by the high salt 
treatment. Rous sarcoma virus LTR, adenovirus major late promoter and SV40 promoter 
constructs were also examined and gave similar results to those seen with the HIV-1 DNA 
template. 

5 

2. Demonstration of P-TEFb -like activity in HeLa nuclear extracts 

In the Drosophila system the DRB-sensitivity of the generation of highly processive 
elongation complexes is due to the CTD kinase activity of P-TEFb (Marshall and Price, 1995). 
To examine the similarity of the DRB-sensitive process in HeLa transcription to that of the well 
1 0 characterized Drosophila system, the sensitivity of HeLa elongation complexes to both DRB and 
H-8 were compared. 

Si Two min pulse/45 sec chase transcription assays were performed with variable amounts 

fi of DRB or H8. The inhibition of the appearance of runoff was quantitated and plotted versus the 
W5 concentration of inhibitor (FIG. 4). The 50% inhibition point occurred at 1 ^iM DRB compared 
g3 to 20 ^iM H8 (FIG. 4). Greater sensitivity to DRB was also seen with the Drosophila system 
^ which was 50% inhibited by 0.7 p,M DRB compared to 7 \jM H8 (see Example 2). It has been 
jf shown that the sensitivity of the CTD kinase activity of P-TEFb is very dependent on the 
H conditions used in the assay (Marshall and Price, 1995; see Example 1). Conditions used here 
were similar to those used in the Drosophila study. These inhibitor studies indicate that there is 
H a HeLa equivalent to Drosophila P-TEFb. 

Since P-TEF was first identified using an immobilized template assay, the ability of HeLa 
NE to generate DRB-sensitive elongation complexes when added back to isolated EECs was 
25 examined. Drosophila actin A2 DNA template and K c cell NE was used to form EECs 
containing Drosophila RNA polymerase II. An immobilized A2 template was preincubated with 
K c cell NE and then pulsed for 30 sec with [ct- 32 P] CTP. The resulting EECs were isolated, and 
washed. Only short transcripts were produced during the pulse and the labeled tRNAs were 
removed by the washing protocol. Addition of only NTPs during the 15 min chase generated a 

-161- 

A: 123701 (2NG501I.DOC) 



pattern of short and intermediate transcripts that were DRB-insensitive and did not make runoff 
transcripts. However, addition of K c cell NE to the chase, allowed some EECs to enter 
productive elongation and make long DRB-sensitive transcripts. When a HeLa NE that was 
capable of producing high levels of DRB-sensitive transcription was added to the chase, DRB- 
5 sensitive transcripts were also produced. 

The HeLa NE used in the other studies in this study was not able to generate an easily 
detectable level of DRB-sensitive transcripts in the add-back assay, most likely due to the fact 
that isolated early elongation complexes are not as sensitive to P-TEF as they are when they are 
1 0 formed in the presence of the extract. When oc-amanitin was added prior to the chase, elongation 
by all EECs was inhibited proving that transcripts were produced by RNA polymerase IL When 
an immobilized HIV-1 DNA template was used to form human EECs both K c cell and HeLa NE 
m could be added back to form DRB-sensitive transcripts. 

H I 5 3. Effect of Purified Drosophila P-TEFb on Transcription 

m The results presented here show that the HeLa NE used for most studies has limiting 

^ amounts of a human P-TEFb -like activity. A number of Drosophila basal transcription factors 
^ have been shown to be able to replace their human counterparts (Kephart et al, 1994; Wampler 
M* Kadonaga, 1992; Wampler et al., 1990). Purified Drosophila P-TEFb was studied to 

^20 determine if it could function in the human system. Pulse/chase transcription reactions using 

HeLa NE were carried out. Reactions were chased for either 15 seconds or 1 min and contained 

either 10 or 40 jug/ml template. 

Addition of Drosophila P-TEFb increased the level of DRB-sensitive transcripts present 
25 at either time point. The largest increase in DRB-sensitive transcripts was obtained with 40 
(ig/ml template. At either template concentration addition of P-TEFb also reduced the amount of 
short transcripts. The short transcripts also reappeared upon addition of DRB, demonstrating that 
P-TEFb acts on complexes that are destined to produce short transcripts in the absence of 
functional P-TEFb. 
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These results indicate that the CTD kinase activity of P-TEFb is responsible for its effect 
on transcription and that a large number of phosphorylation events on an EEC by P-TEFb can 
cause productive elongation. If human P-TEFb is limiting and multiple phosphorylation events 
5 are required for the transition into productive elongation, then increasing the number of EECs in 
a reaction with a constant amount of P-TEFb would cause a decrease in the DRB-sensitive runoff 
transcripts because the activity of P-TEFb would be spread over a larger number of complexes 
and the number of EECs that receive the required number of phosphates would be less. 

10 When a template titration from 5 to 40 \ig/ml template with a constant amount of HeLa 

NE was done, the number of EECs increased with the increase in template but the level of DRB- 
sensitive runoff decreased (FIG. 5). When pure Drosophila P-TEFb was added to otherwise 
identical reactions the level of DRB-sensitive runoff transcripts was dramatically increased. 
Without additional P-TEFb the level of DRB-sensitive runoff began to decrease above 10 |ag/ml, 

15 but with additional P-TEFb the level of DRB-sensitive runoff continued to increase at even 40 
jag/ml. These results indicate that P-TEFb acts as a CTD kinase that must act multiple times on 
each EEC. 

These results demonstrate that two classes of processive human RNA polymerase II 
20 elongation complexes are formed after initiation at a promoter in vitro. The two classes are 
differentiated by their elongation rates and sensitivities to DRB. DRB-sensitive transcripts were 
elongated at a faster rate than the DRB-insensitive transcripts. However, both types of 
complexes were able to transcribe RNAs up to 4000 nucleotide in length. HeLa extracts made 
using a protocol similar to that used to produce Drosophila K c cell NE generated a much lower 
25 fraction of DRB-insensitive long transcripts. These extracts may have more of the negative 
factors required or less of the DRB-insensitive factor acting like P-TEFb or may have different 
levels of general elongation factors. Nevertheless, these results of this example clearly 
demonstrate the functionality of the human cell system. 
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These results indicate that P-TEFb -like activity is limiting in HeLa nuclear extracts for 
the generation of DRB-sensitive transcripts. Purified Drosophila P-TEFb stimulated the 
production of DRB-sensitive transcripts up to 10-fold in the HeLa transcription system. 
Therefore, even if other factors are required to generate DRB-sensitive transcripts, these factors 
5 are not limiting. High levels of exogenous P-TEFb did not reduce the formation of DRB- 
insensitive, processive complexes. If there is another factor which causes the transition into 
DRB-insensitive productive elongation, it must not be in competition with P-TEFb to modify 
EECs. 

10 These results provide further evidence for the action of P-TEFb as a CTD kinase. 

Multiple phosphorylation events are required to cause the shift from the Ila to the IIo form of the 
polymerase, and P-TEFb is not stably associated with the elongation complex (Marshall and 
% Price, 1992; Marshall and Price, 1995). Therefore, the distributive action of P-TEFb depends on 
jy the concentration of the factor and the concentration of its substrate (EECs). The results 
NT 5 presented here substantiate this conclusion. As the template concentration was increased in a 
m background of constant low level of P-TEFb found in HeLa NE, the amount of DRB-sensitive 
runoff decreased as the number of EECs increased. When P-TEFb levels were elevated, the 
H same template titration now gave more DRB-sensitive runoff at every template concentration and 

M: now, the number of DRB-sensitive runoff transcripts did not decrease as the number of EECs 
JJ20 increased. These results are consistent with the distributive action of P-TEFb such that at low 
^ concentrations of the factor and high template concentrations many of the EECs do not receive 
the required number of phosphates to cause the transition into productive elongation. When the 
P-TEFb levels were higher more of the EECs were phosphorylated to the level required for the 
transition to take place. 

25 
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EXAMPLE 4 
Cloning and Expression of Drosophila P-TEFb 

A. Peptide Sequences from P-TEFb 

5 Drosophila P-TEFb was purified as described in Marshall and Price (1995) and Example 

1 (Methods, Section 3) from Drosophila K c cell nuclear extract derived from 500 liters of K c 
cells (see Example 3, Methods, Section A4 for preparation). About 30 jig of pure P-TEFb was 
run on a 6-15% gradient SDS-PAGE protein gel. The regions of the gel containing individual 
subunits were excised and sent to the W.M. Keck Foundation Biotechnology Resource 
10 Laboratory (Yale University, New Haven CT). Using standard techniques, each protein was 
subjected to in situ proteolysis and the resulting peptides were separated using reversed phase 
HPLC. Individual HPLC peaks were analyzed by MALDI mass spectrometry and fractions 

^ containing predominately one species were subjected to sequencing. Two or three peptide 

03 sequences were obtained from each subunit. 

ml 5 

II B. Cloning of Drosophila P-TEFb Small Subunit 

|j Two peptide sequences, MLQQPSGSTPSNV (SEQ ID NO:31) and 

ADTALNHDFFWTDPMPS (SEQ ID NO:32), were obtained as described above in Section 1 
Q from sequencing the small subunit fragments. Based on these two peptides, two degenerate 
f^20 primers, 

;C 5'-GGAATTCNATGYTNCARCARCC (SEQ ID NO:9) which encoded the region 

MLQQP (SEQ ID NO:33) of SEQ ID NO:3 1 and 

5'-AACTGCAGTCCARAARAARTCRTGRTT (SEQ ID NO: 10) which encoded the 
region NHDFFWT (SEQ ID NO:34) of SEQ ID NO:32, 
25 were designed. The template for the PCR™ reaction was Drosophila K c cell cDNA made 
through reverse transcription using a 3' RACE™ (Rapid Amplification of cDNA End) kit 
according to the manufacturer's directions (Gibco, Gaithersburg, MD). A 1.1 kb cDNA fragment 
was amplified by PCR™ (30 cycles of 0.5 min at 94 °C, 1 min at 55 °C and 1.5 min at 72°C) 
using Vent™ DNA polymerase. A 0.7 kb fragment from the 3' portion of the PCR™ product was 
30 cloned into the Bluescript SK (Stratagene) after digestion with Pst I (one internal Pst I site in the 
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PCR™ product and one Pst I site designed into the 3' end of the splice site of Bluescript SK 
(Stratagene). 

The 3' region of full length cDNA sequence was obtained using a Gibco 3' RACE™ kit 
5 with two specific primers from the 0.7 kb fragment, 

5'-TGTCAAGGATCAAACCGGCTGTGAT (SEQ ID NO:l 1) and 
5'-CGAATTCCAAGAAACGCATCGATGC (SEQ ID NO: 12). 
The 5' region of the full length cDNA was obtained using a Gibco 5' RACE™ kit with three gene 
specific primers from the 0.7 kb fragment 
1 0 5'-AGACCTGCCAAATCGTGT (SEQ ID NO: 1 3), 

5'-AGAAGGTGGATCTGTAACCATTCGT (SEQ ID NO: 14) and 
5'-GGAATTCAGATCTCGATCAGATTCA (SEQ ID NO: 15). 

The coding sequence of the small subunit was cloned by reverse transcription PCR™ 
M 15 (RT-PCR). First, the cDNA of small subunit was generated in a reverse transcription using a 
Lfi primer S'-TTACTACTCGAGCTACCAAACCCGGTC (SEQ ID NO: 16) and Drosophila 
J : embryonic mRNA as the template. Second, the coding sequence was produced in a 30-cycle 
^ PCR™ reaction using VENT DNA polymerase, two primers 
H 5 '-TAAGCAAGCTTCTATGGCGC ACATGTCC (SEQ ID NO:17) and 

Jj20 S'-TTACTACTCGAGCTACCAAACCCGGTC (SEQ ID NO: 1 8) 

and the cDNA as the template. The PCR cycles were performed for 0.5 min at 94°C, 1 min at 

55°C and 3.5 min at 72°C according to the manufacturer's directions (NEN, Boston, MA). 

Finally, the coding sequence was digested with Hind III and Xho I and cloned into pET-21a 

vector. 

25 

C. Cloning of Drosophila P-TEFb Large Subunit: 

Three peptide sequences SPEWPDI (SEQ ID NO:35), WYFSNDQLANSPSR (SEQ ID 
NO:36) and TVHGMPPFEQQLPY (SEQ ID NO:37) of the large subunit that were obtained as 



A: 12370 1(2NG501!.DOC) 



-166- 



described above in Example 4, Section 1. Based on these amino acid sequenced, three 
degenerate primers, 

5'-GGAATTCTGGTAYTTYWSNAAYGA (SEQ ID NO: 19) which encoded the region 
WYFSND (SEQ ID NO:38) of SEQ ID NO:36, 
5 5'-CGGGATCCTGYTCRAANGGNGGCAT (SEQ ID NO:20) which encoded the region 

MPPFEQ (SEQ ID NO:39) of SEQ ID NO:37, and 

5'-CGGGATCCAANGGNGGCATNCCRT (SEQ ID NO:21) which encoded the region 
HGMPPF (SEQ ID NO:40) of SEQ ID NO:37 were designed. 

10 A 1.6 kb cDNA fragment of the large subunit was cloned through nested PCR™ 

reactions. First, a 35-cycle PCR™ reaction was performed using Taq DNA polymerase, the 
degenerate primers, SEQ ID NO: 19 and SEQ ID NO:20, and the Drosophila embryonic cDNA 

5j obtained from Clontech (Palo Alto, CA) as the template. The PCR™ cycles were performed for 
1 min at 94°C, 1 min at 55°C and 3 min at 72°C. Second, a 1.6 kb cDNA fragment was 

-Ml 5 amplified in another 35-cycle PCR™ reaction using the total PCR™ products from the first 

m amplification as the template, the primers of SEQ ID NO: 19 and SEQ ID NO:21 and the same 

7*" reaction conditions. 

U The 1.6 kb fragment of the second round of amplification was digested with EcoRI and 

Jj20 BamHI to yield three fragments of 0.9 kb, 0.6 kb and 0.1 kb. The 0.9 kb and 0.6 kb fragments 
^ were cloned into Bluescript SK (Stratagene), yielding a 1.5 kb insert, and sequenced using 

fluorescent automated sequencing at the DNA Facility at the University of Iowa, following 

manufacturer's recommended protocols. 

25 The 3' region of the full length cDNA sequence was obtained using the 3' RACE kit 

(Gibco) according to the manufacturer's directions. Three nondegenerate, specific primers, 
5'-ATCACGACACCACCAGAGCTGTTA (SEQ ID NO:22), 
5'-CGAATTCAGATCGTGAACGGGA (SEQ ID NO:23) and 
5'-CGAATTCAGGCGCTAGCAATG (SEQ ID NO:24), 
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were designed based on the 1.5 kb cDNA sequence obtained. The 5' region of cDNA sequence 
was obtained using a 5' RACE™ kit (Gibco) according to the manufacturer's directions. Three 
gene specific primers, 

5'-GAAAGGCGTAGAACCGA (SEQ ID NO:25), 
5 5'-GCTGACCCATTTCCTGTATCAGATAG (SEQ ID NO:26) and 

5'-GGAATTCTTCTGCTTGGCGAAT (SEQ ID NO:27), 
were designed based on the 1 .5 kb sequence and used with the 5' RACE kit (Gibco). 

The entire coding sequence of the large subunit was cloned by reverse transcription 
10 PCR™ (RT-PCR). First, the cDNA of large subunit was generated in a reverse transcription 
reaction using the primer, 5'-GGGAATTCGAGGTTCTATACATAT (SEQ ID NO:28) and 
Drosophila embryonic mRNA as the template. Second, a 4 kb cDNA fragment containing the 
S coding sequence was produced in a 35-cycle PCR™ reaction using Expand™ polymerase 
1: (Boehringer Mannheim, Indianapolis, IN), the two primers 
H I 5 5'-CTGTGTGAATGGAATCTGTGATGTG (SEQ ID NO:29) and 

03 5'-GGGAATTCGAGGTTCTATACATAT (SEQ ID NO:28) 

'I* and the Drosophila cDNA as the template. The PCR™ cycles were performed according to the 
[I following protocol, supplied by the manufacturer: 

N 8 Amplification reactions were denatured for 2 min at 94°C; 10 cycles, 10 

Jj20 sec. at 94°C, 30 sec at 58°C, 3 min at 68°C; 5 cycles, 10 sec at 94°C, 30 sec at 

58°C, 4.5 min at 68°C; 5 cycles, 10 sec at 94°C, 30 sec at 58°C, 6 min at 68°C; 5 
cycles, 10 sec at 94°C, 30 sec at 58°C, 7.5 min at 68°C; 5 cycles, 10 sec at 94°C, 
30 sec at 58°C, 9 min at 68°C; 1 cycle, 7 min at 68°C. 
The resulting amplification product was reamplified in another PCR™ reaction using Vent™ 
25 DNA polymerase, the reamplified 4 kb cDNA fragment and the primers 

5'-TATCCCGGGTCATATGAGTCTCCTAGCC (SEQ ID NO:30) and 
5'-GGGAATTCGAGGTTCTATACATAT (SEQ ID NO:28). 
Finally, the coding sequence was digested with Sma I and EcoR I and cloned into pET-21a 
vector. 
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D. Characterization of P-TEFb 

cDNA fragments of P-TEFb large and small subunits were amplified from the total 
Drosophila embryonic cDNA by using the degenerate primers derived from the peptide 
5 sequences, see Sections B and C above. Based on the cDNA fragments, the 5' end and 3' end 
sequences of P-TEFb cDNAs were obtained from Drosophila K c cell mRNAs using the RACE 
technique. The full length small subunit cDNA sequence (SEQ ID NO:l) was 1.45 kb 
containing an open reading frame of 1.21 kb, beginning at position 115 of SEQ ID NO:l. The 
large subunit cDNA (SEQ ID NO:3) was 4.33 kb with a coding region of 1.10 kb, beginning at 
10 position 716 of SEQ ID NO:3. The encoded small subunit was found to be a cyclin-dependent 
kinase (CDK) of 46.8 kDa (SEQ ID NO:2), while the large subunit was a cyclin of 121.1 kDa 
(SEQ ID NO:4). All data derived from the sequences are consistent with the identified properties 
of P-TEFb. 

15 The amino acid sequence, SEQ ID NO:2, identified the small subunit of Drosophila 

P-TEFb as a member of the Cdc2-like cyclin dependent kinase family with over 40% identity to 
S. pombe Cdc2. Thus P-TEFb is a cyclin-dependent kinase (CDK) because the small subunit has 
all the conserved subdomains found in CDKs and the large subunit has the conserved cyclin box 
domain. The kinase activity of a CDK is tightly regulated by four conserved mechanisms. A 

20 CDK can be activated by the binding of a cyclin subunit and the phosphorylation of a conserved 
threonine residue at the T-loop in the catalytic subunit. A CDK-cyclin complex can be either 
inhibited by the phosphorylation of a threonine residue and a tyrosine residue at the ATP-binding 
site in the catalytic domain, or inactivated by the binding of a family of small proteins termed 
CKIs (Morgan, 1995). 

25 

Whereas the peptide sequences of catalytic subunits are well conserved in the CDK 
family, the sequences of cyclins are rather divergent. Almost all of the cyclin subunits contain a 
diverse sequence of about 100 residues called a cyclin box, which is predicted to have a 
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conserved helix-rich secondary structure. Helical fold prediction for the cyclin box is described 
in Proteins 24, 1-17. 

The full length cDNA of the Drosophila large subunit encodeda 1097 amino acid protein 
5 (SEQ ID NO:4). As expected the protein contained a canonical cyclin box presumably to allow 
binding to the small cdk subunit. The carboxyl-terminal two thirds of the protein did not reveal 
any specific protein motifs, however, the middle third is highly charged and likely to be 
somewhat unstructured, while the carboxyl-terminal third has high levels of potential helical 
regions suggesting that it might fold into a specific domain. Attempts to express the cDNA in E. 
10 coli, even with the small subunit, only gave insoluble truncated proteins. However expression of 
both subunits in a baculovirus expression system gave rise to recombinant proteins with identical 
mobility to authentic P-TEFb purified from Drosophila K c cells on a silver stained SDS PAGE 
gel. Purification of the recombinant protein was aided by the addition of a HIS-tag to the 
fi carboxyl-terminus of the small subunit. Both subunits were quantitatively recovered on a nickel 
Hi 5 column indicating that the subunit interactions were strong. This protein had levels of DRB- 
m sensitive CTD kinase activity indistinguishable from authentic P-TEFb and was able to 
functionally replace authentic P-TEFb during transcription. 

U Northern blotting was used to further confirm the presence of the P-TEFb mRNAs in 

^20 Drosophila K c cells, embryos and female adults. A probe made from the full length of the small 
^ subunit cDNA recognizes a mRNA band around 1.7 kb. Three probes that were generated from 
the 3', internal, 5' cDNA sequences of the large subunit cDNA recognize an mRNA band 
approximately 5.3 kb in Drosophila adult females. Final wash stringency was O.lx SSC at 68°C. 

25 E. Expression of P-TEFb in E. coli 

The full length P-TEFb cDNAs were amplified from embryonic mRNA by reverse 
transcription PGR™ (RT-PCR™) and cloned into a pET21a expression vector in E. coli as 
described above in Sections B and C. Constructs were made to express (1) the small subunit, (2) 
a GST-small subunit fusion protein, (3) the large subunit, and (4) the GST-large subunit fusion 
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protein. Two other constructs were used to co-express the small subunit and the large subunit, or 
the small subunit and the GST-large subunit fusion protein. 

Six constructs were made to express P-TEFb in E. coli usin-g a T7 polymerase-dependent 
5 expression system. First, a GST coding sequence from pEG(KT) was used to replace the 
Ndel/Sall fragment in pET21a (Novagen) to construct a GST-expression plasmid (pET2 la- 
GST). Second, the large subunit coding sequence was digested with Smal and EcoRI and cloned 
into pET2 la-GST vector to construct a plasmid (pET21a-GST-BL) for expression of the GST- 
large subunit fusion protein. The small subunit coding sequence was digested at the designed 

10 Hind III site and Xho I sites and cloned into pET2 la-GST vector to construct a plasmid 
(pET21a-GST-BS) for expression of the GST-small subunit fusion protein. Third, the GST 
coding sequence was removed from the pET21a-GST-BL and the pET21a-GST-BS by Nde I 
digestion and the recombinant vectors were religated to construct a plasmid (pET21a-BL) for 
expression of the large subunit and the other plasmid (pET21a-BS) for expression of the small 

1 5 subunit. 

The final constructs were prepared by obtaining the coding region of the GST-large 
subunit fusion protein via the Xba I digestion of pET21a-GST-BL, and inserted into pET21a-BS 
to construct a plasmid (pET21a-GST-BL-BS) for the co-expression of the GST-large subunit 
20 fusion protein and the small subunit. The coding region of the large subunit was obtained by the 
Xba I digestion of pET21a-BL, and inserted into pET21a-BS to construct a plasmid (pET21a- 
BL-BS) for the co-expression of the large subunit and the small subunit. 

All constructs gave rise to appropriately sized protein products when transformed into the 
25 DE3 host and induced by adding isopropylthio-beta-galactoside (IPTG); P-TEFb small subunit 
(43 kDa); GST-P-TEFb small subunit (73 kDa); P-TEFb large subunit (124 kDa); GST- P-TEFb 
large subunit (154 kDa). All proteins were insoluble when expressed alone or in combinations 
with both large and small subunits. 
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Two constructs were made to express P-TEFb in the insect cells by using a Baculovirus 
expression system. First, the coding region of the small subunit was obtained by the Hind III and 
Xho I double digestion of pET21a-GST-BS, and inserted into pBAC4X-l (from Novagen) to 
construct a plasmid (pBAC4X-l-BS) for the expression of the small subunit. Second, the coding 
5 region of the large subunit was obtained by the Xma I and EcoR I double digestion of pET21a- 
GST-BL, and inserted into pBAC4X-l-BS to construct a plasmid (pBAC4X-l-BL-BS) for the 
co-expression of the large subunit and the small subunit. 

The protocol for expression in Sf9 cells is as follows: 
10 L Linearized BacuIoGold DNA and transfer vector: 

a. DNA templates: 

Linearized BacuIoGold DNA (2.5 ug/25 ul) 

Transfer vector containing large and small subunits of Drosophila P-TEFb 
pBAC4X-l-BL-BS ( 1 ug/ul) 

15 

b. Buffers (see Example 3, Section A.2): 

Buffer A (Grace's Medium with 10% Fetal Calf Serum) 

Buffer B (25 mM Hepes pH 7.1, 125 mM CaCl 2? 140 mM NaCl) 

20 2. Co-transfection: 

Sf9 cells are cultured in TNM-FH medium (in suspension) to log phase. 2 million cells 
are placed in an T-25 flask and the cells are allowed to attach to the bottom of the flask for 30 
min. The medium is removed from the flasks and 1 ml of Buffer A is added to each flask such 
that the buffer covers all cells. 4 ug transfer vector pBAC4X-l-BL-BS and 0.5 ug BacuIoGold 
25 DNA are mixed together in a sterile tube for 5 min. 1 ml of Buffer B is added to each tube 
containing the DNA mixture. All of the DNA solution (now in Buffer B) is pipetted, drop-by- 
drop, onto the cells such that the buffer covers all cells. Minor precipitation should be visible in 
the flask. Cells are left undisturbed at 27°C for 4 h. After 4 h the incubation medium is changed. 
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The medium is removed from the transfection and control flasks. 3 ml TNM-FH + 10% FCS is 
added to each flask and cells are incubate at 27°C for -4-5 days. 

EXAMPLE 5 

5 Human P-TEFb 

A. The Small, Kinase Subunit 

The human homolog of the small subunit of Drosophila P-TEFb was first identified by 
comparing the protein sequence derived from direct protein sequencing of the small subunit of 
10 Drosophila P-TEFb with genetic databases accessible via a BLAST genetic analysis search from 
the National Institutes of Health (NIH). Using the complete sequence of the Drosophila subunit 
obtained from translation of a full length cDNA, the inventor identified the existence of a human 
D cDNA homologous to the Drosophila P-TEFb small subunit protein. The search of the protein 
J database revealed a human protein, PITALRE, SEQ ID NO:6 (Grana et al., 1994), that exhibits 
J^l 5 about 72% identity and about 83% similarity to the Drosophila protein, SEQ ID NO:2. The high 
N; level of sequence similarity indicated that PITALRE is a potential homologue of the small 
IB subunit of Drosophila P-TEFb and, therefore, may be a component of human P-TEFb. Two 
^ kinases from S. cerevisiae, SGV1 (Irie et al., 1991) and CTK1 (Sterner et al., 1995), each share 
g 43% identity with PITALRE, SEQ ID NO:6, and the small subunit of Drosophila P-TEFb, SEQ 
11120 ID NO:2. Although sequence similarity does not allow the prediction of a potential yeast 
Q homologue, CTK1 has recently been demonstrated to increase the elongation efficiency of RNA 
polymerase II (Lee and Greenleaf, 1997). 

The human protein identified was first cloned using PCR™ with degenerate 
25 oligonucleotides derived from cell division cycle 2 (CDC2) family sequences (Grana et al., 
1994). This protein was called "PITALRE" because of the presence of those amino acids in a 
characteristic location in the kinase subunit. The initial characterization of "PITALRE" included 
a sequence comparison with other kinases, a northern blot showing ubiquitous expression, an 
immunoprecipitation study suggesting the association of other proteins with "PITALRE", and a 
30 western blot suggesting that the protein was localized in the nucleus. The highest expression of 
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the protein was in the liver and placenta which caused the authors to speculate that "PITALRE" 
was involved in specialized functions in certain cell types (Grana et al 9 1994). Although Grana 
et al (1994) investigated the function of the protein "PITALRE" and its interaction with the 
tumor suppresser gene product pRB, they did not suggest that it may interact with RNA 
polymerase II. A later paper mapped the chromosomal location of "PITALRE" to a region found 
to be involved in tumors and breast cancer (Bullrich et al, 1995). 

B. The Large, Cyclin-like Subunit 

The inventor realized that human P-TEFb, like the Drosophila protein, is likely to 
comprise at least two subunits, one a cyclin-dependent kinase (CDK) and the other containing a 
cyclin box domain. As the amino terminal portion of the large subunit has a cyclin box, the 
inventor contemplated that this portion of the large subunit interacts with the small kinase 
subunit. The carboxyl terminal portion of the large subunit is larger than CTDs previously found 
attached to cyclin boxes, leading the inventor to envision that this portion of the large subunit 
interacts with other factors, for example the HIV viral protein Tat. 

1. Cloning of Human P-TEFb Large Subunits: 

Considerable difficulty was encountered in the initial cloning of the cyclin subunits of 
human P-TEFb, although the ultimate success of the inventor means that this can now be 
routinely achieved. Intially, searches of protein sequence databases revealed no homologous 
proteins of the Drosophila P-TEFb large subunit. Only by searching the EST database, with the 
BLAST genetic analysis (National Institutes of Health), for homologues of the Drosophila 
P-TEFb large subunit were three, relatively short, human EST sequences, zr91fl0.sl, yd48c03.rl 
and nc70h05.rl, found that the inventor realized may be part of the human P-TEFb large subunit. 

Without having the Drosophila large subunit sequence it would have been impossible to 
identify any human homologues. The EST database contains very short sequences with 
numerous errors. For example, high quality sequence for zr91fl0.sl stops at 214 bases; for 
yd48c03.rl at 204 bases; and for nc70h05.rl at 339 bases. Further, the EST clones often 
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contained 5' UTR, 3 r UTR or unspliced sequence that could not be compared to the Drosophila 
sequence. Even when the EST sequences were found to encode protein sequence similar to the 
Drosophila sequence, errors in the sequence caused the reading frame to be lost. Without the 
Drosophila sequence it would have been impossible to know whether the sequence was 
5 important or not. Each of the EST sequences does represent a human cDNA sequence, but no 
function was hypothesized for any of the EST sequences. Further clone distribution information 
for all three EST sequences can be found through either the National Center for Biotechnology 
Information at the National Institutes of Health of through the LM.A.G.E. Consortium/LLNL at: 
www-bio.llnl.gov/bbrp/image/image.html. 



S'-TTCCCACCAATGCTTTCC-S 1 SEQ ID N0:51, 5 ! -CC ATCAGTTGATACAGGGATCT-3 ' 
SEQ ID NO:52, and 5-GGAATTCAGAAGGTTGTAAGATGC-3' SEQ ID NO:53 were 
designed and used to obtain the 5' region of a total cDNA sequence from human brain poly A+ 
15 RNA using a 5' RACE kit (Gibco). The 3' region of a total sequence was obtained by using three 
primers, 5'-ACACACAGATGTGGTGAAATGTACCCA-3' SEQ ID NO:54, 
5'-GCATCTTACAACCTTCTG-3* SEQ ID NO:55, and 

5'-GGAATTC ATGG AAAGC ATTGGTGGG AAT-3 ' SEQ ID NO:56, a brain Marathon-ready 
cDNA and a Marathon cDNA Amplification Kit (Clontech). The total cDNA sequence obtained 
20 was 4528 bp and termed HBL1, SEQ ID NO:43. 

The total cDNA sequence, HBL1, (SEQ ID NO:43) was amplified by RT-PCR using 
primers 5 -CCTCC ACT ACTGGTTTGCCTGG-3 ' SEQ ID NO:57, 
5'-GGACTAGTATAAATATGGCGTCGGGCCGTG SEQ ID NO:58, and 
25 5'-GGAGATCTTACATGTTCATTCCTTGGG SEQ ID NO:59 and Expand polymerase under 
the following conditions: 



10 



Based on the sequence from zr91fl0.sl, three primers, 



step 1 
step 2 
step 3 
step 4 



94°C 
94°C 
65°C 
68°C 



2 min 
10 sec 
30 sec 

3 min 
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step 5 


Go to step 2 for 9 times 




step 6 


94°C 


10 sec 


step 7 


65°C 


30 sec 


step 8 


68°C 


4.5 min 


step 9 


Go to step 6 for 4 times 




step 10 


94°C 


10 sec 


step 1 1 


65°C 


30 sec 


step 12 


68°C 


6 min 


step 13 


Go to step 10 for 4 times 




step 14 


94°C 


10 sec 


step 15 


65°C 


30 sec 


step 16 


68°C 


7.5 min 


step 17 


Go to step 14 for 4 times 




step 18 


94°C 


10 sec 


step 19 


65°C 


30 sec 


step 20 


68°C 


9 min 


step 21 


Go to step 1 8 for 4 times 




step 22 


68°C 


7 min 


step 23 


stop 





Two coding sequences were amplified: HBL1-1 (2091 bp), SEQ ID NO:44, and HBL1-2 
(2190 bp), SEQ ID NO:46. They were cloned in a plasmid pBAC 4X-1-HBS, which contains 
human P-TEFb small subunit sequence. These two sequences were also amplified by using 
5 HeLa RNA. 

Based on the EST sequence of nc70h05.rl ? the three primers 
5'-GGAGACAAGTATGTGCTACCTTGATGACA-3' SEQ ID NO:60, 
5'-GGAATTCGGGCTGCTCCTCCACTTTAG-3' SEQ ID NO:61, and 

1 0 5'-GGAATTCGCTGCTGGAGCCACAGAA-3' SEQ ID NO:62 were used to obtain the 5' region 
of a total cDNA sequence from human bone marrow Marathon-ready cDNA (Clontech) by using 
a Marathon cDNA Amplification Kit (Clontech). Using the same cDNA and RACE kit, the 
3' region was obtained by using the primers 5'-GTGTCACTGAAAGAATACCG-3' SEQ ID 
NO:63 and 5'-GGAATTCAGGTGGAGATAAAGCTGC-3' SEQ ID NO:64, which were based 

15 on the EST sequence of yd48c03 .r 1 . A total cDNA sequence was obtained and designated HBL3 
(SEQ ID NO:48). According to the length of the PCR products, the whole HBL3 was 2.8 kb. 
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The first 2.36 kb of the PCR product was sequenced. The rest of the sequence was 3' UTR and 
was not sequenced. 

The total cDNA sequence, HBL3, (SEQ ID NO:48) was then amplified by RT-PCR using 
the primers 5'-GCTCTAGATAAATATGGAGGGAGAGAGGAA-3* SEQ ID NO:65, 
5'-GGAATTCTTACTTAGGAAGGGGTGGAAGTG-3' SEQ ID NO:66, and 
5 ' -GGAATTCTTACTTAGG AAGGGGTGGAAGTGGTGGAGGAGGTT-3 ' SEQ ID NO:67 
from human HeLa cell mRNA. Conditions for amplification of the large subunit using 
eLONGase (Life technologies) were denaturation for 30 sec at 94°C; then 35 cycles, 20 sec at 
94°C, 30 sec at 55°C, 2.2 min at 68°C; then 5 min at 68°C for each reaction. 

A single coding sequence for HBL3 (2181 bp), SEQ ID NO:49, was amplified and cloned 
into a plasmid pBAC 4X-1-HBS as described previously. 

2. Analysis of Human P-TEFb Large Subunit cDNAs and Proteins: 

The coding sequence HBL1-1 (SEQ ID NO:44) encodes one complete protein (SEQ ID 
NO:45). Comparison of the cDNA sequences of HBL1 (SEQ ID NO:43) and HBL1-1 shows 
that the entire coding sequence HBL1-1 is contained within HBL1. The beginning of the start 
codon (ATG) of the encoded protein corresponds to position 1 of HBL1-1 and position 46 of 
HBL1, respectively. Thus, the total cDNA sequence of HBL1 includes 45 bp upstream of the 
start codon and 2392 bp downstream of the stop codon (TAA). Interestingly, HBL1-2 (SEQ ID 
NO:46) contains an 99 bp intron which is not present in either HBL1 (SEQ ID NO:43) or HBL1- 
1 (SEQ ID NO:44) but is in-frame and can be translated. This 99 bp intron begins at position 
1927 and ends at position 2025 of coding sequence HBL1-2 (SEQ ID NO:46). If the 99 bp 
intron of HBL1-2 is excluded then HBL1-1 and HBL1-2 are identical. 

The coding sequence of HBL3 (SEQ ID NO:49) encodes a different protein which is 
completely contained with the total cDNA sequence of HBL3 (SEQ ID NO:48). The beginning 
of the start codon (ATG) of the encoded protein corresponds to position 1 of the coding sequence 
of HBL3 (SEQ ID NO:49) and position 45 of the total cDNA sequence of HBL3 (SEQ ID 
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NO:48), respectively. Thus, the total cDNA sequence of HBL3 includes 44 bp upstream of the 
start codon and 135 bp downstream of the stop codon (TAA). 

Comparison of the total cDNA sequences, HBL1 (SEQ ID NO:43) and HBL3 (SEQ ID 
5 NO:48), revealed that the EST sequence zr91fl0.sl was homologous to a small region of HBL1 
and a perfect match to the compliment of positions 603-665 of SEQ ID NO:43. It is noteworthy 
that only 50 bp of the Drosophila large subunit sequence is homologous to this same EST 
sequence zr91fl0.sl. Whereas, the EST sequences nc70h05.rl and yd48c03.rl were 
homologous to HBL3 and near perfect matches to positions 210-417 and 750-884, respectively, 
10 ofSEQIDNO:48. 

Comparison of the encoded proteins showed that the proteins encoded by HBL1-1 and 
HBL1-2, SEQ ID NO:45 and SEQ ID NO:47, respectively, are identical except for a 33 bp 
contiguous stretch which is encoded by the intron in HBL1-2. Comparison of the protein 

15 encoded by HBL1-1, SEQ ID NO:45, and the protein encoded by HBL3, SEQ ID NO:50, shows 
that the two human proteins share an overall absolute identity of 54% and an overall relative 
similarity of 70%. Within the region defined as the cyclin box, positions 1-252 of HBL1-1 and 
positions 1-253 of HBL3, the two proteins share an identity of about 81% to each other. 
Downstream of the cyclin box, the two proteins, SEQ ID NO:45 and SEQ ID NO:50, are about 

20 46% identical to each other. 

When compared to the Drosophila large subunit protein, SEQ ID NO:4, both human 
proteins have an identity of about 65% to the Drosophila cyclin domain box (positions 1-280 of 
SEQ ID NO:4). The two human proteins were 81% similar to each other in this same region. 
25 However, downstream of the cyclin box, the two human proteins are only about 25% identical to 
the Drosophila protein. Overall, the Drosophila large subunit is about 42% identical to the 
protein encoded by HBL1-1 (SEQ ID NO:45) and 34% identical the protein encoded by HBL3 
(SEQ ID NO:50). 
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As was seen for the Drosophila large subunit the amino-terminal domain contained a 
cyclin box that was 65% identical to the Drosophila protein. The two human proteins were 81% 
similar in this region. Both potential human proteins lacked the highly charged, unstructured 
central region, but had a slight similarity (25% identity) to the carboxyl-terminal domain. 
5 Western blotting with antibodies to each of the two proteins that were expressed in E. coli are 
used to determine the size of both encoded proteins. The two cyclin subunits were co-expressed 
with a HIS-tagged human kinase subunit (PITALRE) in a baculovirus system. As was found for 
Drosophila P-TEFb the large subunit was quantitatively recovered after nickel column 
chromatography indicating that subunit interactions are strong. In contrast to PITALRE alone, 

10 both kinase/cyclin pairs (PITALRE/HBL 1 - 1 and PITALRE/HBL3) gave rise to very strong DRB 
sensitive phosphorylation of the CTD of RNA polymerase II. This clearly indicates that human 
P-TEFb is a cyclin dependent kinase, since the activity of PITALRE was stimulated 
approximately 100 fold by the cyclin subunit. The two subunit protein obtained after expressing 
HBL1-1 large subunit with PITALRE was able to function in transcription when added back to a 

15 Drosophila nuclear extract depleted of P-TEFb using anti large subunit antibodies. Although it 
is not clear if HBL1-1 encodes the proper length protein, HBL3 encodes a protein with identical 
mobility to the 87 kDa protein found in PITALRE immunoprecipitates. This is confirmed by 
using antibodies for the human proteins and western blot analysis. It is likely that antibodies 
against the protein encoded by HBL3 react with the 87 kDa band seen in PITALRE 

20 immunoprecipitates. 

3. Expression of Human P-TEFb: 

The coding regions for HBL1-1, HBL1-2 and HBL3 were individually expressed in a 
Baculovirus expression system (PharmMingen, San Diego CA) along with the HIS-tagged small 
25 subunit "PITALRE". A nickel column was used to purify the "PITALRE" from the cytoplasm of 
infected Sf9 cells. The "PITALRE" was found to associate with each respective putative large 
subunit protein, and thus indicates that the cyclin domain of each protein can interact with the 
kinase subunit. 
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Using the CTD Kinase assay, as previously described in Example 2, Section A.7, both 
HBL1-1 (SEQ ID NO:45), HBL2(SEQ ID NO:47) and HBL3 (SEQ ID NO:50) were substituted 
for the large subunit in the Drosophila transcription system. The activity of each recombinant 
protein is shown in Table 1. Alone "PITARLE" has little kinase activity and no detectable 
activity in transcription. Thus, these data demonstrate that the proteins encoded by HBL1-1, 
HBL1-2 and HBL3 can functionally act as the P-TEFb large subunit in Drosophila and likely 
act in a homologous manner in humans. 



Table 1. Recombinant proteins expressed in vitro using a kinase assay and a HIS-tagged small 
subunit. 



Subunits 


Complex formed? 


Kinase activity 


Transcription activity 


DBS 




very little 


none 


HBS 




very little 


none 


DBS+DBL 


yes 


high 


high 


HBS+DBS 


no 






HBS+HBL1 


yes 


high 


high 


HBS+HBL2 


yes 


ND 


ND 


HBS+HBL3 


yes 


high 


ND 



D is Drosophila , H is human, BS is P-TEFb small subunit, BL is P-TEFb large subunit, ND is 
not determined. If the large subunit was recovered along with the HIS-tagged small subunit from 
the nickel column it was assumed that a complex was formed. The kinase assay used RNA 
polymerase II as a substrate and transcription activity was determined by adding the recombinant 
protein into a Drosophila extract depleted of P-TEFb using antibodies against the large subunit. 



4. Tissue Distribution of Expression of Human P-TEFb Subunits. 

The expression of mRNAs encoding PITALRE and the two human cyclin subunit clones 
was examined across a wide variety of human tissues using northern blot analyses. All three 
were ubiquitously expressed. This is consistent with a general requirement for P-TEFb in all 
tissues. mRNAs encoding both cyclin subunits were present is all tissues suggesting that a tissue 
specific role for one of the subunits is not likely. 

5. Reconstitute of Tat Transactivation with RecombinantHhuman P-TEFb. 

To confirm that the recombinant human P-TEFb comprised of PITALRE and cyclin 
HBL1-1 could reconstitute Tat transactivation, an add back study was performed on HeLa 
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extracts depleted of PITALRE and, therefore, depleted of P-TEFb. Conditions were as described 
in Example 6. FIG. 6 shows that Tat was able to stimulate the DRB sensitive runoff transcript 
from the HIV-LTR. 

5 6. Generation of an Inducible PITALRE Kinase Knockout HeLa Cell Line. 

A tetracycline inducible HIS-tagged PITALRE kinase knockout (D167N) expression 
vector was constructed and stably transfected Tet-on HeLa cells using co-transfection with a 
hygromycin expression plasmid. Cell lines were selected in the presence of hygromycin, but in 
the absence of doxycyclin. Several lines were obtained that exhibited doxycyclin induced 

10 expression of a protein with slightly lower mobility than PITALRE that reacted with anti- 
PITALRE antibodies. A time course indicated that after 30 hrs of induction maximal levels of 
the inactivated PITALRE were produced. The level of the induced PITALRE mutant was equal 
to the level of endogenous PITALRE as analyzed by SDS PAGE followed by immunoblotting 
with anti-PITALRE antibodies. A cell line that expressed a great excess of transfected 

15 PITALRE was not obtained, suggesting that the levels of PITALRE may be controlled by post- 
transcriptional mechanisms. 

7. Optional Purification of Human P-TEFb : 

Purification of human P-TEFb is simplified by the use of antibodies that detect the small 
20 subunit of P-TEFb. The inventor has generated rabbit polyclonal antibodies that react with the 
small subunit of P-TEFb. These antibodies were raised against a recombinant human P-TEFb 
small subunit. A cDNA encoding the human P-TEFb small subunit was amplified from human 
cDNA using PCR™ primers designed from the human cDNA sequence found in the database. 
The cDNA was cloned in a pET expression plasmid and the resulting plasmid was expressed in 
25 DE3 cells. The recombinant protein was purified under denaturing conditions and subsequently 
used to inoculate rabbits. The antibodies were generated using standard techniques (Pocono 
Rabbit Farm). 

On western blots the antibodies strongly recognize the bacterially produced human 
30 protein and a 42 kDa protein in HeLa nuclear extracts. The inventor has analyzed HeLa extracts 
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that have functionally different levels of P-TEFb activity and has found that the differences 
correlate with the level of the 42 kDa protein present in the extracts. Immunoprecipitation and 
immunodepletion studies are useful in showing that functional human P-TEFb (transcriptional 
activity and CTD kinase activity) is recognized by the antibodies. 

Purification of human P-TEFb from HeLa cell nuclear extract is carried out by using 
standard methods, essentially as described in Examples 1 and 2 and Marshall and Price (1995). 
P-TEFb is assayed after each step of the purification using a western blot. Evidence already 
shows that the purification of human P-TEFb is achievable using methods similar to those used 
for the purification of the Drosophila protein. Chromatography on phosphocellulose was carried 
out and it was found that P-TEFb eluted in the high salt fractions. The inventor's results are 
consistent with the existence of more than one chromatographic form of P-TEFb. Further 
purification is achievable using chromatography on phenyl-sepherose, Mono Q™, Mono S™ and 
hydroxylapitite. 

Although the data is consistent with the existence of multiple forms of the human large 
subunit, the form or forms isolated in the 42 kDa HeLa extracts correlate with P-TEFb activity. 

EXAMPLE 6 
P-TEFb is the Tat-Associated Kinase 

As P-TEFb is sensitive to DRB which is canonically a kinase inhibitor, it was clear that 
P-TEFb is a kinase (Marshall and Price, 1995; Example 1). Although likely targets for a 
transcription factor kinase that acts early during elongation would be RNA polymerase II or a 
basal initiation factor such as TFIIF, there is no obvious similarity between kinases known to be 
involved in transcription regulation and the subunit composition of P-TEFb (Marshall and Price, 
1995). Therefore it is a surprising discovery that P-TEFb is essential for the HIV viral protein 
Tat to activate elongation of the viral RNA genome. This finding is particularly surprising in 
that it was earlier reported that the TAK protein appear to be most similar to factor 2 or P-TEFa, 
not P-TEFb (Marshall and Price, 1995). 
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Tat is a viral protein that acts as an activator of transcription of the HIV viral transcription 
unit. It binds to short nascent HIV transcripts and causes RNA polymerase II to synthesize 
mRNA sized transcripts. It was hypothesized by several investigators that Tat might enhance the 
5 action of RNA polymerase II CTD kinases and that this might have a positive effect on the 
elongation potential of the polymerase. An interaction between Tat and a CTD kinase has been 
demonstrated using immobilized Tat and HeLa nuclear extract (Yang, et al, 1996). The human 
immunodeficiency virus Tat proteins specifically associate with TAK in vivo and require the 
CTD of RNA polymerase II for function. This TAK protein is sensitive to DRB, but has not 
10 been purified or otherwise identified and has not been shown to be required for Tat 
transactivation. 



Similar studies to the immediately preceding were performed using the 48 amino acid 
transactivation domain of Tat (Tat 48A) fused to GST (GST-Tat) and coupled to glutathione 
beads. HeLa extracts were incubated with the beads and then the beads were extensively 
washed. The proteins that were bound were eluted with SDS and heat and analyzed on a gradient 
protein gel. Silver staining indicated that less than 1% of the HeLa proteins bound, but a 
significant portion of the human P-TEFb was bound as detected by western blotting using 
antibodies against the small subunit of human P-TEFb. 

A. Methods 

1. Tat Proteins: 

GST-Tat 1 48A and GST-Tat 1 48A P181S in the pGEX2T vector (Pharmacia) were 
obtained from the AIDS Research and Reference Reagent Program (NIH) and expressed in 
E. coli. 500 ml cultures were induced and the crude lysates obtained after using the French press 
were analyzed on silver stained SDSPAGE gels. Roughly equal levels of the 35 kDa proteins 
were expressed (1 mg/ml of ly sate). 
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2. Binding Assays: 

25 ul of glutathione beads were washed 4 times with 200 ul EBC (50 mM Tris, pH 8, 150 
mM NaCl, 0.5% NP-40) and then incubated with 100 jul of an E. coli lysate containing GST 
expressed from a pET21 vector (see Example 4). To pre-clear the extract the GST beads were 
5 washed 4 times with EBC and then incubated for 3 h. at 4°C with 1 00 pi of HeLa nuclear extract 
that had been spun at 15,000 rpm for 5 min. 5 ul of glutathione beads (10 ul of 50% slurry) was 
transferred to a yellow pipet tip plugged with glass wool and then washed with 400 ul EBC + 5 
mM DTT. 5 ul of E. coli lysate containing either GST-Tat 1 48 A or GST-Tat 1 48A P181S 
(controls were done with no bound protein and with GST alone) and 50 ul EBC + 5 mM DTT 

10 was allowed to slowly flow over the beads 4 times. The beads were then washed with 200 pi 
EBC + 5 mM DTT + 0.075% SDS and then 100 ul EBC + 5 mM DTT. 15 ul pre-cleared HeLa 
nuclear extract was passed over the tip 6 times. The beads were then washed with 900 ul EBC + 
5 mM DTT + 0.03% SDS. 20 ul of protein gel loading buffer was added and the tip boiled for 4 
min. 10% of the eluted proteins were analyzed on a 6-15% polyacrylamide SDS protein gel by 

1 5 silver staining and 90% of the sample was run on a similar gel and blotted to nitrocellulose. The 
blot was probed with a 1:1000 dilution of anti-human P-TEFb antisera followed by 1:20,000 
dilution of the secondary antibody. Reacting proteins were detected using a ECL detection kit. 

3. In vitro Transcription Conditions 

20 One preferred method to examine the functional interaction between Tat and P-TEFb is to 

first develop a P-TEFb-dependent human in vitro transcription system and then examine the 
function of Tat with and without added purified human P-TEFb. The P-TEFb-dependent system 
can be obtained in any of three ways. 

15 First since human P-TEFb has similar chromatographic properties to Drosophila P-TEFb, 

it is possible to remove the factor by passing extracts through phosphocellulose at 0.4 M 
HGKEDP as was done for the Drosophila extracts (Marshall and Price, 1995; Example 1). The 
HeLa-FT (flowthrough) fraction contains much less P-TEFb and is dependent on added P-TEFb 
to obtain DRB-sensitive transcription. 
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The second method uses is based on antibody depletion. Purified IgGs from the rabbit 
anti-P-TEFb serum are covalently coupled to Affigel (BioRad). The IgG beads are washed 
thoroughly with PBS and then incubated at room temperature with HeLa nuclear extract. The 
5 beads are then removed by centrifugation or filtration and the resulting extract is examined for a 
reduction in DRB-sensitive transcription. The third method uses a more defined in vitro 
transcription system in which pure or partially pure transcription factors are mixed together. The 
first two methods result in an extract that has greatly reduced levels of P-TEFb and in the third 
method one of the required fractions would be P-TEFb. 

10 

In all cases the depleted extracts or fractions not containing P-TEFb are examined for 
their ability to allow Tat transactivation. A pulse-chase or continuous labeling protocol is used 
in which the template DNA is preincubated with the extract, 20 mM HEPES, pH 7.6 and 7 mM 
MgCl2 for 20 min at 30°C. For the pulse chase protocol nucleotides, including [a- 32 P]CTP, are 
1 5 added to start the pulse; 2 min later, excess cold CTP is added to initiate the chase. Reactions are 
stopped after various chase times. During the pulse, 10 ul reaction mixtures contain the 
following: 

20 mM HEPES (pH 7.6), 7 mM MgCl 2 , 600 uM each GTP, UTP and ATP, 5 uCi of 

[a-32p]CTP (~1 uM CTP), 66 mM KC1, 10-40 ng/ml DNA template, and 3 ul of HeLa- 
20 FT or HeLa nuclear extract as a control. 

For the chase, unlabeled CTP was added to bring the total concentration of CTP to 1.2 
mM and the final reaction volume to 12 ul Human P-TEFb and Tat are added to the 
preincubation mixture where appropriate. For the continuous labeling protocol the pulse mix is 
25 supplemented with 30 uM CTP and the reactions are allowed to continue for 20 min with all 
combinations of inclusion of tat and P-TEFb. Reactions are stopped by adding 200 ul of stop 
solution (1% Sarkosyl, 50 mM Tris, pH 8.0, 50 mM EDTA, 100 mM NaCl and 100 ug/ml 
tRNA). The reaction mixtures are phenol extracted and the nucleic acids were ethanol 
precipitated, washed with 70% ethanol, dried and analyzed by gel electrophoresis. 
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To correlate the inhibition of binding to the inhibition of Tat transactivation, reactions 
would be supplemented with increasing concentrations of candidate small compounds or 
proteins. The inhibition would be quantitated by measuring the amount of runoff transcript or 
the amount of bound P-TEFb and activity plotted versus the amount of the inhibitor. 50% 
inhibition points would be determined and compared in the two assays. If inhibition was similar 
for a given compound in both assays it would be concluded that the compound functioned by 
inhibiting the interaction of P-TEFb with tat. 

B. Results 

Control studies indicated that P-TEFb binding was specific. A GST-Tat protein 
containing a single amino acid change in the activation domain that causes Tat to lose its ability 
to transactivate in vivo did not bind P-TEFb. Other controls, glutathione beads alone or beads 
with GST only, also did not bind P-TEFb. These results indicate that P-TEFb is the CTD kinase 
that binds to Tat. The ability of tat to increase the elongation potential of RNA polymerase II 
can now be explained through its interaction with P-TEFb. 

Further conformation that the Tat/P-TEFb interaction is important is obtained by 
determining that human P-TEFb is one of the factors required for reconstructing tat 
transactivation in vitro. A protein blot containing all the required factors is probed with the 
human P-TEFb small subunit antibodies to determine which fraction contains P-TEFb. Since it 
is known that the human P-TEFb elutes in the high salt step from phosphocellulose, required 
factors from that region are likely candidates. Confirmation of the Tat/P-TEFb interaction is 
obtained by correlating results with inhibitors of the binding assay (microtitre plate assay) with 
the in vitro transcription assays. 
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EXAMPLE 7 

P-TEFb is Required for HIV-1 Tat Transactivation in vitro 



A. Generation of Antibodies of PITALRE 

5 PITALRE-CT antibodies were affinity-purified rabbit IgG directed to the C-terminal 20 

amino acids of PITALRE (Santa Cruz Biotechnology). Antibodies against the whole PITALRE 
were generated using purified recombinant protein as antigen. First two primers 
5'-pGCAGGATCCAGAATTCCATATGGCAAAGCAGTACGACTCGG-3' (SEQ ID NO:41) 
and 5'- pCAGTACTCGAGTTATCAGAAGACGCGCTCAAAC - 3' (SEQ ID NO:42) were 

10 used in a PCR™ reaction to amplify the cDNA of human P-TEFb small subunit. The human 
brain cDNA mix (Clontech Co.) was used as template. The PCR™ product was digested with 
Eco RI and Xho I. The resulting 1 . 1 kb fragment was purified using Qiagen Gel Extraction Kit™ 
(Qiagen Co.). pET21a (Novagen Co.) was digested with Eco RI and Xho I. The resulting 5.4 kb 
fragment was purified using Qiagen Gel Extraction kit™ (Qiagen Co.). The 1.1 kb fragment and 

15 5.4 kb fragment were ligated by using T4 DNA ligase. After amplification, the cloned vector was 
digested with Nde I and the larger fragment (6.5 kb) was purified using Qiagen Gel Extraction 
kit™ (Qiagen Co.) and religated. The final vector was amplified and transformed into DE3 
(BL21) competent cells for expression of human P-TEFb small subunit. 

20 The transformed DE3 cells were grown to ODgQO = 0.6 and induced with 1 mM IPTG. 

After a 3 hour induction, the cells were collected and lysed by passing through a French press 
three times. The lysate was subjected to centrifugation at 15,000 x g for 30 minutes. The pellet 
was solubilized in 0.1 M TUS (20 mM Tris, pH7.5, 0.1 M NaCl and 7M Urea) and loaded onto a 
Mono Q column (Pharmacia Co.). The flow through (FT) fraction of the Mono Q column was 

25 loaded onto a Mono S column (Pharmacia Co.). The flow-through fraction of the Mono S 
column was subjected to dialysis against Phosphate buffer (20mM phosphate, pH7.0). The 
dialyzed solution was centrifuged at 15,000 x g for 30 minutes. The pellet was suspended in 
Phosphate Buffer (20mM phosphate, pH7.0) and used to generate rabbit antibodies following 
standard protocols (Pocono Rabbit Farm and Laboratory, Inc.). Preimmune serum was obtained 

30 before injection of antigen (the human P-TEFb small subunit) into the rabbit. Test bleeding was 
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performed 42 days after the first injection of the antigen and antisera were generated monthly 
after the test bleeding. 

B. Immunodepletion of human P-TEFb 

5 Immunodepletion was performed by passing Protein A Sepharose-precleared HNE (in 

20 mM HEPES, pH 7.6, 15% glycerol, 165 mM KC1, 0.1 mM EDTA, 1 mM DTT, 0.1 
mM PMSF) through two affinity columns made with Protein A Sepharose beads pre-bound with 
anti-PITALRE-CT antibodies or control IgGs (affinity-purified rabbit anti-goat IgG, Sigma). For 
every 50 ul of HNE, 10 ul of protein A beads containing 1 ug of bound IgG was used. After 
10 depletion of HNE, the antibody-containing beads were extensively washed with 100 times the 
bead volume of 20 mM HEPES (pH 7.6), 0.5% NP-40, 1% Triton X-100, and 5 mM DTT, 
600 mM or indicated concentration of NaCl, and then washed with 25 times the bead volume of 
20 mM HEPES (pH 7.6) and 1 mM DTT. The amounts of the washed beads used for the kinase 
assays and silver staining were the equivalent of 1 ul and 10 ul of HNE, respectively. 

15 

C. CTD kinase assay 

CTD kinase assays were performed in a 20 ul reaction containing 20 mM HEPES 
(pH7.6), 10 uM ATP, 5 uCi [y- 32 P]ATP, -10 ng Drosophila RNA polymerase II, 5 mM MgCl 2 , 
and purified Drosophila P-TEFb or immunoprecipitated human P-TEFb. In the reactions 
20 containing DRB it was added to 50 uM unless other indicated amounts were used. The reactions 
were incubated at 30°C for 1 hour. 

D. TAK activity assay 

Preparation of Tat fusion proteins and the TAK pull-down assay were conducted as 
25 described by Herrmann and Rice (Herrmann and Rice, 1993; 1995) with modifications as 
described. Nuclear extract, cytoplasmic extract, or fractions containing partially purified TAK 
were incubated with glutathione-Sepharose beads containing GST-Tat fusion proteins for 1 hour 
at 4°C with gentle rocking. DE3 (BL21) bacteria containing the GST-Tat expression vectors 
were obtained from NIH AIDS Research and Reference Reagent Program. For maximum 
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sensitivity, GST-Tat48A was used unless otherwise specified. The beads (30 ul, 50% slurry) 
were washed 6 to 8 times with 1 ml EBCD buffer (50 mM Tris, pH 8.0, 120 mM NaCl, 0.5% 
Nonidet P-40, and 5 mM DTT) containing 0.03% SDS, then 2 to 4 times with Tat kinase buffer 
(TKB/Mg: 50 mM Tris-HCl, pH 7.6, 5 mM DTT, 5 mM MnCl 2 , and 4 mM MgCl 2 ), and brought 
5 to 50 ul with TKB/Mg buffer and kinase assay mix. The final reaction contained 2 uM ATP, 
lOuCi y- 32 P-ATP (ICN, 3000 Ci/mmole), and 50-100 uM of CTD trimer peptide CTD3 
(ACSYSPTSPSYSPTSPSYSPTSPSKK, SEQ ID NO:68). Reactions were incubated at 25°C for 
40 minutes, stopped by boiling in Laemmli sample buffer, and resolved by electrophoresis in 
15% polyacrylamide: bis-acrylamide (30%:0.15%) gels. 

10 

E. Partial Purification of TAK 

TAK was partially purified from a HeLa cell cytoplasmic fraction (2.6 gm protein) 
prepared according to Ausubel et al. (Ausubel et al, 1989) except that dialysis was omitted. 
Proteins precipitating between 10% and 40% saturation of ammonium sulfate (816 mg) were 

15 resuspended in DEAE buffer (25 mM HEPES, pH 7.6, 150 mM KC1, 0.1 mM EDTA, 1 mM 
DTT, 0.1 mM PMSF, 4 mM MgCl2 , protease inhibitors aprotinin, leupeptin, pepstatin A at 1 
ug/ml each, and 10% glycerol), dialyzed against the same buffer, and applied to a 230 ml DEAE- 
Sepharose column equilibrated in DEAE buffer. The flow-through fraction (350 mg protein) was 
concentrated by 50% ammonium sulfate precipitation. The proteins were resuspended in HE 

20 buffer (same as DEAE buffer except 25 mM HEPES, pH 6.9, and 100 mM KC1 were used) and 
loaded onto a perfusion chromatography heparin affinity column (POROS 20 HE; PerSeptive 
Biosystems). The column was washed with the same buffer and proteins were eluted with a 
linear gradient of 100-500 mM KC1 in 15 column volumes of HE buffer. TAK activity eluted 
from 200 mM to 250 mM KC1. Active fractions were diluted with an equal volume of the same 

25 buffer except that the pH was 8.0 and KC1 was omitted, and loaded onto a perfusion 
chromatography cation exchange column (POROS 20 SP; PerSeptive Biosystems). The column 
was washed with SP buffer (the same as HE buffer except that the buffer was 25 mM HEPES pH 
7.5) and eluted with a linear gradient of 100-500 mM KC1 in 15 column volumes of SP buffer. 
The active fractions (1 mg protein) eluted at -300 mM KC1. 
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F. Transcription assay 

The transcription template (Nco I-digested pLTR-4/CAT vector containing the HIV-1 
LTR from -153 to +80) was used in a pulse-chase transcription experiment. Reactions (12 ul 
5 total; 20 mM HEPES, pH 7.6, 7 mM MgC^, 20 jig/ml DNA template, 3 ul HNE, 64 mM KC1, 
with 50 uM DRB or 0.3 ul Drosophila P-TEFb (Marshall et al, 1996) as indicated) were pre- 
incubated for 20 minutes at 30°C, pulsed for 2 minutes by adding ATP, GTP, UTP to 600 uM 
each and 5 uCi [a- P]CTP (-0.1 mM), and chased for 5 minutes by adding CTP to 1.2 mM. 
Reactions were stopped, phenol extracted and analyzed on a 6% polyacrylamide gel as described 

1 0 (Marshall and Price, 1 995). pLTR-TAR-Luc and pLTR-DTAR-Luc plasmids were digested with 
Eco RI to generate TAR and DTAR template that were used in the studies. The TAR template 
contains HIV-1 LTR from -475 to +76, while the DTAR template contains HIV-1 LTR from - 
475 to +19. Transcription with TAR and DTAR templates generates runoffs of 694 and 646 
nucleotides, respectively. The transcription reaction mix contained 20 mM HEPES (pH7.6), 

15 7 mM MgCl 2 , 60 mM KC1, 0.5 ul HNE (or depHNE as indicated), 20 ng/ml TAR template (or 
DTAR template as indicated), indicated amount of HIV-1 Tat (86 residue HIV-1 Tat followed by 
a streptavidin binding tag at its carboxyl terminus) and DRB. The reaction mix was pre- 
incubated for 15 minutes at 30°C and the transcription was started by adding nucleotides to final 
concentrations of 50 uM ATP, 50 uM GTP, 50 uM UTP, 10 uM CTP, and 5 uCi [oc- 32 P]CTP 

20 (-0.1 mM), and continuously labeled for 20 minutes, or by adding nucleotides to final 
concentrations of 50 uM ATP, 50 uM GTP, 50 uM UTP, and 5 uCi [a- 32 P]CTP (-0.1 mM), and 
pulsed for 2 minutes. The reactions were stopped and reaction mix were phenol-extracted. The 
transcripts were then analyzed on a 6% polyacrylamide gel as described (Marshall and Price, 
1995). 

25 

G. PITALRE is a Component of Human P-TEFb 

According to the present discovery, if PITALRE is the functional homologue of the small 
subunit of Drosophila P-TEFb. Therefore, removal of PITALRE from HeLa nuclear extract 
(HNE) should eliminate DRB-sensitive runoff transcripts. To confirm this, PITALRE was 
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immunodepleted from HNE with antibodies directed against the last 20 amino acids of PITALRE 
which are not shared with other known kinases. Western blot analysis indicated that PITALRE 
was removed by anti-PITALRE antibodies to levels below detection, but not by control 
antibodies. The depleted HNE was unable to generate DRB-sensitive 633-nt run-off transcripts 
5 from an HIV-1 LTR template in a pulse/chase transcription reaction. Addition of pure 
Drosophila P-TEFb to the depleted extract restored DRB-sensitive transcription. These results 
indicate that depletion of PITALRE abolished human P-TEFb activity, supporting the hypothesis 
that PITALRE is a component of human P-TEFb. 

PITALRE was not previously known to be a CTD kinase. Therefore, the material 
10 immunoprecipitated was examined during the depletion of HNE for CTD kinase activity. The 
antibody-loaded beads containing PITALRE were washed extensively with high salt and 
subjected to a CTD kinase assay. Similar to Drosophila P-TEFb, beads containing PITALRE 
(together with any other strongly associated proteins) were able to convert the largest subunit of 
Drosophila RNA polymerase II to the hyperphosphorylated Ho form. Control beads were 
15 inactive. As expected, all phosphorylation was sensitive to 50 mM DRB. In the control reaction 
with Drosophila P-TEFb autophosphorylation of both subunits (43 and 124 kDa) was seen. In 
the reaction with beads containing PITALRE antibodies, the 40-kDa PITALRE, a band of similar 
size to the large subunit of Drosophila P-TEFb and several other bands were phosphorylated. 

To examine the association of other proteins with PITALRE, immunoprecipitates were 
20 washed with buffer containing non-ionic detergents and increasing amounts of salt. The proteins 
associated with the beads were analyzed by SDS polyacrylamide gel electrophoresis followed by 
silver staining. When no salt was present many proteins were retained on the beads. Most of 
these proteins were removed by washing with 200 mM NaCl. No changes occurred in the 
proteins visible after a wash with buffer containing NaCl higher than 400 mM. Besides 
25 immunoglobulin heavy and light chain and PITALRE (40 kDa), proteins with sizes 87, 105, 133, 
and 140 kDa were found. When rabbit anti-goat IgG control beads were used no other proteins 
except for the immunoglobulins were seen after high salt washes. The immunoprecipitates were 
incubated with [y- 32 P]-ATP to determine which proteins became phosphorylated. The beads 
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washed without salt carried out extensive phosphorylation of many protein substrates with little 
DRB-sensitivity. After washing with 400 mM or higher concentration of NaCl, only a few 
proteins were labeled and all phosphorylation was DRB-sensitive. Of the major proteins 
associated with PITALRE only the 105 kDa protein was not phosphorylated. At all salt 
5 concentrations the partially DRB-sensitive phosphorylation of a 207 kDa protein was observed. 
Except for PITALRE the identity of the other proteins is unknown. The sizes of these proteins 
do not correlate with subunits of other known basal transcription factors. The other proteins 
could be constituents of a larger complex containing P-TEFb or different complexes containing 
PITALRE. 

10 A distinguishing characteristic of P-TEFb is its sensitivity to the kinase inhibitors DRB 

and H-8. In vitro transcription and the CTD kinase activity of Drosophila P-TEFb are both 
inhibited by these two compounds and in both assays DRB is 10 fold more potent than H-8 
(Marshall and Price, 1995; Marshall et aL, 1996). The effect of DRB and H-8 were determined 
on transcription in the HNE (FIG. 7A) and on kinase assays using the immunoprecipitated 

15 human P-TEFb (FIG. 7B). In both assays DRB was the more effective inhibitor. DRB and H8 
compete with ATP for binding to the kinase active site and, therefore, it is inappropriate to 
compare directly the 50% inhibition points under different conditions, especially if different 
concentrations of ATP are used (Marshall et aL, 1996). However the ratio of 50% points for 
different compounds under identical conditions can be compared. This ratio is more likely to be 

20 independent of assay condition. The ratio of 50% inhibition points (H-8/DRB) was 23 \xMI\ |utM 
= 23 for the transcription assay and 16 jaM /0.65 \\M = 25 for the CTD kinase assay. These 
ratios were similar to each other suggesting that the same kinase, namely human P-TEFb, was 
inhibited in both assays. Considering the sequence similarity between the small subunit of 
Drosophila P-TEFb and PITALRE, the functional similarity between Drosophila P-TEFb and 

25 the activity removed from HNE by PITALRE antibodies, and the presence of other potential 
subunits in the immunoprecipitates, the inventor concluded that PITALRE is a component of 
human P-TEFb. 
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H. P-TEFb Specifically Associates with the Activation Domain of HIV Tat 

Several lines of evidence led the inventor to believe that P-TEFb associates with the viral 
transactivator Tat. Tat transactivation is sensitive to DRB (Braddock et al, 1991; Marciniak and 
Shaip, 1991) and requires the CTD (Chun and Jeang, 1996; Parada and Roeder, 1996; Yang et 
5 ah, 1996), and Tat associates with a DRB sensitive CTD kinase (TAK) (Herrmann and Rice, 
1993; Herrmann and Rice, 1995; Chun and Jeang, 1996; Yang et al, 1996). The hypothesis was 
tested by ascertaining if the human P-TEFb kinase associates with Tat during incubation with 
HeLa extracts. Glutathione beads containing various GST-Tat fusion proteins were incubated 
with HeLa extract and extensively washed. Proteins associated with Tat constructs containing an 

10 intact activation domain (Tat72 and Tat48D) were able to phosphorylate the synthetic peptide, 
CTD3, as well as RNA polymerase IL GST-Tat fusions containing mutations in the activation 
domain that abolish Tat-transactivation (Herrmann and Rice, 1995; Rice and Carlotti, 1990) were 
not able to pull down TAK. The proteins associated with the Tat constructs were probed with 
anti-PITALRE antibody by western blot analysis. PITALRE was detected only when the 

15 constructs contained an intact Tat-transactivation domain. This indicates that P-TEFb is a Tat 
associated CTD kinase. 

When human P-TEFb was depleted from HNE by antibodies to PITALRE, TAK activity 
(assayed by GST-Tat48A pull-down) was reduced to less than 2% of that found in the intact 
extract. This strongly suggests that under the conditions used human P-TEFb is the predominant 

20 CTD kinase that associates with Tat. Others have shown that TFIIH can associate with Tat 
(Parada and Roeder, 1996; Garcia-Martinez et al., 1997b), but under the conditions used no p62 
subunit of TFIIH was detected as being bound to Tat nor did the inventor detect a reduction in 
the amount of the subunit in the PITALRE-depleted extract. In addition, when glutathione beads 
containing TAK were probed with antibodies to all three subunits of CAK (CDK7, cyclin H, 

25 MAT1 ; supplied by D Morgan) no evidence of the TFIIH associated kinase was found. 

To further confirm that human P-TEFb can associate with Tat, TAK was partially 
purified from HeLa cells by sequential chromatography on DEAE, Heparin and SP resins. 
Fractions eluted from the second and third columns, Heparin and SP, were assayed for TAK 
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activity and probed for human P-TEFb. In the eluate of both columns, TAK activity assayed by 
GST-Tat48D pull-down correlated with P-TEFb determined by western analysis using two 
different preparations of PITALRE antibodies. A GST construct with a mutation in the Tat 
activation domain did not become associated with CTD kinase activity when incubated with the 
5 same column fractions. In addition, the drug-sensitivity of TAK and human P-TEFb were 
compared. As expected, TAK activity was inhibited by DRB and H-8 in a manner similar to P- 
TEFb. 

I. Human P-TEFb is Required for Tat-Stimulated Elongation 

To investigate the function of P-TEFb in Tat transactivation directly, the effect of Tat on 

10 initiation and elongation in whole or PITALRE-depleted extracts were compared (FIG. 8). Two 
different DNA templates were used. The TAR template contained HIV-1 LTR sequences from - 
475 to +76 while the DTAR template lacked the sequence encoding TAR and contained HIV-1 
LTR sequences from -475 to +19. Using HNE (not depleted) and a continuous labeling protocol, 
10 ng/ul Tat stimulated the generation of runoff about 12 fold (filled bars in FIG. 8). This effect 

15 of Tat was mostly inhibited by 50 uM DRB and required the TAR sequence. A DRB titration 
indicated that the DRB sensitivity of Tat-stimulated runoff was similar to that determined for the 
low level of runoff from the HIV-LTR in the absence of Tat. When PITALRE-depleted extract 
was used, the majority of the stimulatory effect of Tat was abolished. This result strongly 
suggests that human P-TEFb is required for efficient transactivation by Tat. Add-back of 

20 Drosophila P-TEFb to the depleted extract stimulated runoff in the presence or absence of Tat, 
but did not restore the ability of Tat to specifically enhance elongation. This suggests that other 
required factors were removed by the depletion of PITALRE or that Drosophila P-TEFb lacks 
appropriate domains required for interaction with Tat or other cofactors. Addition of the high 
salt washed PITALRE immunoprecipitate had no effect on runoff in the presence or absence of 

25 Tat, indicating that immobilization had a negative impact on the function of P-TEFb. Although 
consistent with an effect on elongation, the experiment did not rule out the possibility that the 
major effect of Tat was on initiation. 
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From the autoradiograph and the normalized quantitation data graphed in FIG. 8 (filled 
bar) it was evident that Tat had a modest effect on transcription (about two fold increase of 
runoff) in the presence of 50 uM DRB, in the presence of DTAR template, or even in the absence 
of human P-TEFb. To further analyze the effect of Tat on transcription, a similar sets of studies 
5 was performed, except that a pulse-only transcription protocol was used instead of a continuous 
labeling protocol. Quantitation of the short pulsed transcripts (less than 20 nucleotides in length) 
gives the efficiency of initiation. Under these conditions it was clear that increasing Tat had the 
effect of increasing transcription initiation (open bars in FIG. 8). At the highest level of Tat used 
(10 ng/ul) initiation increased about two fold compared to when no Tat was added. This effect 

10 on initiation did not require TAR (FIG. 8). Moreover, the effect of Tat on initiation was not 
sensitive to DRB and did not require P-TEFb. The effect of Tat on initiation can be used to 
explain the slight effect of Tat on the generation of runoff in the presence of DRB, DTAR 
template, or depleted HNE. The two assays taken together indicate that the major effect of Tat is 
on elongation, although initiation is slightly affected. Most importantly, P-TEFb is required for 

15 the effect of Tat on elongation, but not its effect on initiation. 

EXAMPLE 8 

Determining the Region of P-TEFb Responsible for the Interaction with Tat 

20 Although Example 7 clearly shows that P-TEFb associates with the HTV protein Tat, the 

interaction of Tat with P-TEFb could be through either subunit. A combination of in vivo and 
in vitro assays are used to localize the interaction surface. Both subunits are expressed, e.g. in 
E. coli, purified and the recombinant proteins used in vitro to determine which one or both of the 
subunits bind to Tat or compete for the binding of intact P-TEFb to Tat. The peptide domains 

25 are also expressible as GST fusion proteins, and soluble proteins are obtained by removing the 
GST moiety by specific protease digestion. 

In order to identify proteins that bind to P-TEFb, the inventor utilizes the yeast two- 
hybrid system to identify proteins that associate with P-TEFb in vivo (Fields and Song, 1989; 
30 Chien et al., 1991; Durfee et al, 1993; Harper et al, 1993, U.S. Patent No. 5,667,973, 
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incorporated herein by reference). This system utilizes the yeast GAL4 protein, a well studied 
eukaryotic transcriptional activator protein. Transcriptional activator proteins are proteins that 
bind to cognate promoter elements upstream of particular genes, and thereupon activate 
transcription of the associated gene (Johnston, 1987). This activation function is believed to 
5 occur through recruitment of specific proteins which are required, along with RNA polymerase, 
to effect transcription. 

Reports have shown that the DNA binding function and the transcriptional activation 
function reside in two distinct regions of the GAL4 protein. Further, it has been shown that these 

10 regions are able to be localized to relatively short peptide regions, which can function separately 
(Brent and Ptashne, 1986; Keegan et al 9 1986). This ability forms the basis of the two-hybrid 
system. In the yeast two-hybrid system, the gene encoding the known protein is cloned as a 
fusion protein with the GAL4 DNA binding domain. Then a gene encoding a protein suspected 
of interacting with the known protein, or a cDNA library (to assay for the presence of a gene 

15 encoding an interacting protein) is cloned as a fusion protein with the transcriptional activation 
domain. Both constructs are then introduced into individual yeast cells. 

Binding in vivo between the two fusion proteins recombines the GAL4 DNA binding 
domain with the GAL4 transcriptional activation domain, which leads to the transcriptional 
20 activation of a marker gene operatively positioned downstream of a GAL4 binding site. The 
transcriptional activation construct is then recovered from the identified yeast cell, and the 
interacting protein is identified by DNA sequencing. 

The standard yeast two-hybrid system described above is a positive association screening 
25 assay. A variation of the two-hybrid assay is the subject of U.S. Patent number 5,525,490, by 
Erickson and Powers, issued June 11, 1996 and incorporated herein by reference. Using this 
reverse two-hybrid system, inhibitors of protein-protein interaction can be assayed. In this 
system, the interaction between the fusion proteins represses the transcription of a marker gene. 
Agents (for example, but not limited to, proteins) which disrupt the association of the fusion 
30 proteins are identified by the increase in transcription of the marker gene. 
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The expressed proteins are included in binding reactions containing GST-Tat 
immobilized on glutathione beads and P-TEFb. In the event that additional cofactors are 
required, they are supplied as a crude extract or partially purified fraction. If there are cofactors 
5 required for the binding of P-TEFb to Tat, the binding assay with pure P-TEFb can be used as a 
means to purify the cofactor. The cofactor can then be used as a potential target for drugs that 
block the function of Tat through the loss of the ability to bind to P-TEFb. 

After washing the beads, the binding of P-TEFb is monitored by western analysis by 
10 using antibodies to the desired subunit of human P-TEFb. The subunit that inhibits the binding 
of P-TEFb to Tat is a strong candidate for being the one that contains the domain that interacts 
between Tat and P-TEFb. 



At the same time, the individual proteins are monitored directly for their ability to bind to 
Tat. The binding of portions of the subunits to Tat are monitored by using antibodies (see 
Example 5) that recognize the peptides. Antibodies from the crude antisera raised against 
individual subunits can be used. (Antibodies to the large subunit of P-TEFb can be used 
similarly.) Alternatively, if the activity of antibodies in the crude antisera is insufficient then 
antibodies are raised against the portions of the subunits that inhibit P-TEFb binding to Tat. 
Once direct binding of a peptide containing the interactive domain to Tat is detected, then the 
identified peptide is useful for screening for drugs that inhibit the binding of P-TEFb to Tat. It is 
envisioned that these peptides are easy to prepare in large quantities using recombinant DNA 
technology for use in screening assays for inhibitors of activators of P-TEFb. 

A complimentary approach is to utilize a reporter gene assay such as one that utilizes 
chloramphenicol acetyltransferase (CAT). The use of reporter gene assays are well known to 
those of skill in the art. Recombinant HeLa cells are prepared such that they express both Tat 
and the kinase and CTD domains of the subunits of human P-TEFb and are also able to support 
Tat-transactivation of an HIV promoter driven CAT construct (HIV-CAT). It is envisioned that 
CAT activity is monitored when HIV-CAT is transiently transfected alone or co-transfected with 
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increasing amounts of different constructs that respectively express the desired domain(s) of the 
human subunits. A decrease in CAT activity signifies that expression of the specific domain(s) 
inhibits the function of Tat. 

5 Antibodies to the respective subunit domains of P-TEFb that interact with Tat are 

generated, using standard methods as modified and described in Example 5, so that expression of 
the domains can be verified using western blotting. Polyclonal antibodies against the entire large 
subunit can also be used, but it is envisioned that the individual domains are useful as antigens 
for the production of more specific antibodies. 

10 

These in vivo studies are useful for the identification of regions of P-TEFb that interact 
with Tat or that directly affect the normal cellular function of P-TEFb. Comparison of the in vivo 
expression results with the in vitro binding studies generally allows correlation of binding to 
function and specifically allows the determination of interaction domains that specifically affect 
1 5 Tat-transactivation but not the general function of P-TEFb. 

Although it is clear that P-TEFb is essential for elongation to occur, it is possible that the 
interaction between P-TEFb and Tat is not direct. One or more other proteins may form a bridge 
between P-TEFb and Tat. The identification of any such proteins would be advantageous for the 
20 understanding of Tat-transactivation and the development of drugs that inhibit its interaction. 
Candidates for Tat/P-TEFb bridging proteins are obtained using the yeast two hybrid system, 
described above, to select for proteins that bind to P-TEFb. 

For example, the bait vector which contains a region or regions of human P-TEFb fused 
25 to a DNA binding protein is constructed and transfected into yeast cells along with a human 
cDNA library fused to transcription activation domain. Yeast are selected based on their ability 
to activate transcription of the gene downstream of the DNA binding region. The large subunit 
of P-TEFb is detected if the small subunit of P-TEFb is used as bait. Once constructed, this 
method is then useful for cloning the large subunit. After the large subunit is cloned, the whole 
30 subunit or portions of it are useful as bait to find proteins that interact with P-TEFb. Candidates 

-198- 

A 1 2370 i (2NG50II.DOC) 



for bridging proteins are screened for their ability to function in vitro by allowing or enhancing 
the binding of P-TEFb to Tat. 

EXAMPLE 9 

5 Interaction of the Herpes Transactivator VP16 with P-TEFb 

The Herpes virus transactivator VP 16 increases the ability of RNA polymerase II to 
synthesize DRB-sensitive runoff transcripts in vitro. Most promoters have sites for activator 
proteins that are present in cell extracts and can allow for the production of DRB-sensitive long 
10 transcripts. But under transcription conditions like those in Marshall and Price (1992), the basal 
adenovirus E4 promoter contains no activator binding sites except for 5 Gal4 binding sites which 
give rise to very low levels of DRB-sensitive runoff transcripts in a Drosophila K c cell nuclear 
extract. Reasonable levels of initiation occur but most of the transcripts (>95%) are short 
(abortive). 

15 

When a protein, containing the N-terminal portion of the VP 16 activation domain (amino 
acids 413-456) fused to the DNA binding region of Gal4 (amino acids 1-147), was added to the 
transcription reaction the amount of DRB-sensitive runoff increased 6.2 fold. Only part of the 
increased production could have been due to increased initiation because only 1.8 times as many 
20 polymerases initiated in the presence of Gal4-VP16 (as determined by quantitating the transcripts 
generated during a short pulse). 

Proteins used to fuse to the DNA binding region were Gal4-VP16 N (wildtype) and Gal4- 
VP16 N-FA442 (mutant) which has the single amino acid change indicated. The proteins were 

25 made by inserting the indicated sequences downstream of the P^ promoter in an expression 
vector and transforming E. coli XA90 cells. 1 liter of cells were grown in LB +Amp to 0.7 
OD 600- After a 3 h. induction with IPTG cells were lysed by sonication in 80 ml of lysis buffer 
(20 mM HEPES, pH 7.5, 10 juM zinc acetate, 20 mM p-mercaptoethanol and 200 mM NaCl). 
The proteins were precipitated with 0.25% Polyethyleneimine, spun, and suspended in lysis 

30 buffer. The pellet was redissolved in lysis buffer with 750 mM NaCl and spun again. The 
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supernatant was ammonium sulfate precipitated with 40% saturated solution. The pellet was 
redissolved in 20 ml lysis buffer with 100 mM NaCl and 1 mM DTT instead of 
P-mercaptoethanol. The supernatant was dialyzed against lysis buffer with 100 mM NaCl. 

5 Transcription conditions were as used in Marshall and Price (1995) and as described in 

Example L If indicated 75 ng of Gal4-VP16 N or Gal4-VP16 N-FA442 were added per reaction. 

A mutant Gal4-VP16 that does not activate well in vivo stimulated DRB-sensitive runoff 
transcripts 2.8 fold and had a slight stimulatory effect on initiation (1.5 fold). Since P-TEFb is 
10 responsible for DRB-sensitive runoff transcripts these data strongly indicate that P-TEFb binds 
to the VP 16 activation domain. 

A Packard Instantlmager™ was used for the quantitation. After running the products of 
the transcription reaction on a denaturing gel, the gel was dried and imaged for 30 min in the 

15 Instantlmager™. Initiation was compared with and without the activator protein by comparing 
the counts generated by all transcripts during the pulse or by comparing the counts that reached 
runoff after chasing with 250 mM KC1 which has been shown to be directly related to initiation. 
Both methods gave the same results. Radioactivity in the region of runoff after a normal low salt 
chase was used to quantitate productive elongation products. DRB-sensitive runoff was 

20 calculated by subtracting the runoff in the lane with DRB from the runoff in the lane without 
DRB. P-TEFb is present in the Kc cell extract used in the reactions. 75 ng of Gal4-VP16 N or N 
mutant was used in each 12 jal reaction. 

These results were extended by removing the upstream region of the Drosophila actin 
25 template normally used in K c cell extract transcriptions and inserting the Gal4 binding sites 
(pBAGl). Transcription of 20 ng/ml pBAGl using a normal pulse chase conditions gave rise to 
a low level of DRB-sensitive runoff transcript but inclusion of 75 ng of Gal4-VP16 N increased 
initiation slightly and significantly stimulated elongation. These are the same results obtained 
when using the E4 promoter. Therefore, the effect of VP 1 6 is independent of the basal promoter 
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that is used. In addition elongation was stimulated by Gal4-VP16 N from the E4 promoter in 
HeLa nuclear extract transcriptions. 

EXAMPLE 10 

5 Assay for Drugs that Block the Interaction of P-TEFb and Tat 

A major goal in medicine today is to find drugs that can inhibit and prevent the 
occurrence of AIDS and spread of HIV infection. It is envisioned that because P-TEFb has a 
central role in cell proliferation and interacts with Tat, P-TEFb is useful as a tool in screening 
1 0 immunoassays to find therapeutic drugs that will inhibit or block the proliferation of HIV. 

Immunoassays, such as an ELISA, are useful to screen for drugs that inhibit the 
interaction of Tat with P-TEFb. For example, Tat-coated microtiter plates are incubated with 
pure P-TEFb or P-TEFb and any other required factors (such as human nuclear extract or 
15 fractions derived there from) needed to cause association with Tat. As indicated previously, a 
requirement for additional cofactors is unlikely, but the existence of needed cofactors does not 
complicate this assay when HeLa extract is used. As before these cofactors can be supplied by 
using the crude cellular extract. 

20 As the interaction between components in this assay are interdependent upon the 

concentration and activity of each other, parameters for each component are established by 
initially using wide ranges and optimizing the amount of each component as desired. For 
example, 1-10 p,g of Tat can be bound per well initially, and 15 |il of HeLa extract provides 
sufficient P-TEFb (about 10 ng) to establish binding. Primary anti-P-TEFb small subunit 

25 antibodies can initially be in a 1:1000 dilution of antiserum. Secondary antibodies can be 
initially diluted 1:20,000. Of course, these amounts, as with all other parameters, may be 
optimized as desired. 

It is envisioned that any peptides, proteins or polypeptides derived from P-TEFb and 
30 shown to be active towards Tat may be used as a P-TEFb template in an immunoassay. After 
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washing, the plates are incubate with antibodies that have been prepared using standard protocols 
and methods as exemplified in Example 5 and that recognize either the small or large subunit, 
intact P-TEFb or peptides derived there from (see Example 8). Following standard methods for 
immunoassay detection, a label is applied to the plates such that microtiter wells that contain 
5 P-TEFb bound to Tat yield a positive signal. 

To determine that a candidate drug inhibits the interaction of the P-TEFb with Tat, a 
competitive assay is performed. It is contemplated that P-TEFb proteins, subunits, polypeptides 
or peptides can be used. Decreasing or increasing concentrations of candidate drugs are added to 
10 microtiter plates which contain constant amount of P-TEFb template and Tat. Disruption of the 
interaction of P-TEFb with Tat is detected as the reduction of the positive signal which indicates 
P-TEFb template bound to Tat. 



Drugs which can strongly inhibit or even prevent the interaction of Tat and P-TEFb 
15 template are potential candidates for further screenings to determine if they interfere with normal 
cellular processes. Preferably, inhibitors are small compounds suitable for use as anti-HIV drugs 
or compounds. The small compounds would bind (like allosteric effectors do) to regions of the 
Tat or P-TEFb that are not involved directly in enzymatic activity and this binding would inhibit 
the interaction between the two proteins. 

20 

Candidate drugs are examined, for example, for their ability to inhibit the normal 
interaction of P-TEFb with RNA polymerase II. The ideal candidate drug only inhibits or 
prevents the interaction of Tat and does not inhibit or prevent the normal interaction of P-TEFb 
with RNA polymerase II or any other normal cellular activities. Suitable screening methods 
25 include, but are not limited to, additional immunoassays, cell culture base assays, animal testing 
models and clinical trials. 
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EXAMPLE 11 
Characterization of Human P-TEFb. 



Immunoprecipitation of PITALRE from HeLa extracts indicates that several other 
5 proteins are tightly associated with the kinase subunit even in the presence of 1 M NaCl and non- 
ionic detergents. Proteins of molecular masses, 87, 105, 133, 140 and 207 kDa are visible with 
silver staining, but the stoichiometry of these proteins is not known. There could also be 
additional proteins that do not stain well or that are hidden under the immunoglobulin bands 
present in the immunoprecipitates. 

10 

A. Characterization of the Two Potential Cyclin Subunits of Human P-TEFb. 

Although northern analysis indicates that the intron containing mRNA is a minor species 
it is not known if the protein encoded becomes functionally associated with PITALRE. To 
examine the association of the proteins encoded by these potential cyclin subunit clones 

15 antibodies are made to each in rabbit (Pocono Rabbit Farm). These antibodies are used to probe 
western blots of PITALRE immunoprecipitates from HeLa cell extract. The recognition of 
proteins in PITALRE immunoprecipitates is strong evidence for a functional interaction. If the 
potential cyclin subunit antibodies recognize proteins associated with PITALRE, the size of these 
proteins is compared with the size of the proteins expressed in the baculovirus system. If there is 

20 a discrepancy between the size of the endogenous proteins and the recombinant proteins 
additional clones are obtained that encode a protein of appropriate size. Results indicate that 5' 
and 3' RACE products frequently have unusual sequences due to the presence or absence of 
intronic sequences and it is possible that the clones obtained so far may not have the correct 
amino- or carboxyl-terminal ends. However, this evidently does not negate their function or the 

25 ability of the skilled researcher to identify correct terminal sequences. The protein encoded by 
human HBL3 cDNA was co-expressed with PITALRE in a baculovirus system and produced an 
active heterodimeric kinase. The protein encoded by human cyclin HBL3 exhibited a mobility 
identical to the 87 kDa protein found in PITALRE immunoprecipitates. These results suggest 
that the protein subunits described herein are P-TEFb. Further confirmation is obtained by using 

30 antibodies to the cyclin subunit to probe PITALRE immunoprecipitates.. 
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To determine if the identified cyclin protein is part of an active complex 
immunodepletions like those done with PITALRE (Example 6) are performed. If the ability to 
synthesize DRB-sensitive transcripts is eliminated from HeLa extracts by the antibodies a strong 
5 functional connection is made between the cyclin subunit and P-TEFb. At the same time 
antibodies against the identified cyclin subunit are used to carry out immunoprecipitation and the 
resulting pattern of proteins are compared to PITALRE immunoprecipitates. Besides PITALRE, 
it is important to determine what other proteins the two immunoprecipitates have in common. 
The current understanding of complexes containing PITALRE does not make clear if all the 
10 observed proteins are in a single complex or if perhaps there are several different PITALRE 
containing complexes. Using antibodies against another protein in the complex allows a better 
understanding of what proteins are in the complex with functionally active PITALRE. 

B. Analysis of Complexes Containing PITALRE and a Cyclin Subunit of P-TEFb. 

15 Since P-TEFb is a cyclin dependent kinase it is possible that it may associate with other 

proteins that regulate its activity. Many CDKs can be inhibited by small molecular mass proteins 
that bind to the cyclin/kinase pair (Morgan, 1995a; MacLachlan et aL, 1995). If such an 
inhibitory protein binds to P-TEFb, the complex would be inactive and would be difficult to find 
with conventional assays that require activity. Armed with antibodies against several subunits of 

20 human P-TEFb different complexes containing PITALRE are examined to determine whether 
they have activity or not. HeLa nuclear extract are fractionated using conventional 
chromatographic methods and complexes containing PITALRE or the cyclin subunit are located 
by western blot. Initially, proteins in the extract are bound to phosphocellulose to remove most 
nucleic acid and the material eluted at 1 M KC1 is applied to a Sepharose S200 column to effect a 

25 size separation. PITALRE and or cyclin containing complexes are identified and subjected to 
further purification. Immunoprecipitation is used to analyze the proteins associated with 
PITALRE at all stages of the purification. The column fractions and immunoprecipitates are 
examined for kinase activity and the column fractions are tested for their ability to function 
during Tat-transactivation when added back to PITALRE depleted extract. Both active and 

30 inactive complexes are identified by this method. Purification of the complexes with known 
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properties allows identification of other potentially regulatory subunits through protein 
sequencing methods like those used in the cloning of the subunits of Drosophila P-TEFb. 

C. Molecular analysis of human P-TEFb. 

5 The availability of antibody and cDNA probes allows the assessment of expression 

patterns of P-TEFb. Some characterization of PITALRE has been carried out and it has been 
reported that PITALRE was localized to the nucleus and PITALRE mRNA was expressed 
ubiquitously over a variety of human tissues (Grana et al, 1994). In addition, PITALRE has 
been mapped to a region of chromosome 9 involved in certain types of cancer (Bullrich et al, 
10 1995). The inventor's work indicates that mRNAs encoding both of the cyclin subunits that have 
been isolated are expressed in a similar manner to that seen for PITALRE. Northern blot 
analyses of tissues involved in the immune response are used to determine if cells that are 
normally infected with HIV preferentially express a specific cyclin subunit or a specific form of 
one the subunits. 

15 

Mapping of the cytogenic location of cyclin subunit(s) found to complex with PITALRE 
is used to determine if the areas map to other loci involved in similar types of cancer as found for 
PITALRE. Mapping is accomplished by using PGR primers that give specific products for 
human versus hamster DNA. Any correlations that are found are then useful for determining if 
20 there are genetic abnormalities in affected individuals. 

D. Functional Analysis of P-TEFb. 

An analysis of P-TEFb function in vivo is carried out using a co-transfection protocol 
with P-TEFb and an HIV-LTR reporter construct. The goal is to determine if over-expression of 

25 active P-TEFb causes an increase in Tat transactivation and to determine what regions of P-TEFb 
are required. Both HeLa and Jurkat cell lines that support modest or very high levels of Tat 
transactivation respectively are used. Expression of PITALRE has been shown to modestly 
stimulate Tat transactivation in several cell lines even though the others have found that 
expression of PITALRE does not increase dramatically the level of kinase activity found in 

30 PITALRE immunoprecipitates (Garriga et aL, 1996a). This inability to increase the kinase 
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activity of PITALRE is likely due to a limitation in the amount of the cyclin subunit(s) required 
for activity. In addition, the inventor and others (Garriga et aL 9 1996a) have not been able to get 
a large over-expression of PITALRE which suggests that the level of the kinase subunit may be 
regulated by a post-transcription mechanism. Co-expression of a cyclin subunit may allow for 
5 greater level of expressed proteins if the ratio of the kinase/cyclin pair is involved in regulating 
the level of the kinase. This can be studied by examining PITALRE levels during transient 
expression of a cyclin subunit in an induced PITALRE cell line. If higher level of the knockout 
kinase are observed, it may be possible to achieve much higher levels of P-TEFb in cells that 
have both kinase and cyclin subunits under the control of active promoters. This type of 
10 experiment may provide general information about the regulation of PITALRE levels in the cell. 
If the levels of the kinase knockout PITALRE protein are increased when the cyclin subunit is 
expressed then the effect of the potential dominant negative mutation on cell growth may be 
observed. If this is the case, the induction of the PITALRE kinase knockout in the presence of 
over-expressed cyclin subunit may be lethal to the cells. 

15 

To examine the effect of over-expression of P-TEFb on Tat transactivation, HeLa or 
Jurkat cells are co-transfected with constructs that express both subunits of P-TEFb and Tat 
along with an HIV-LTR reporter construct. The levels of both subunits of P-TEFb are compared 
before and after transfection to determine the level of over-expression. Measurement of reporter 

20 activity is made with and without Tat and with and without individual subunits of P-TEFb. If 
over-expression of P-TEFb causes a reasonable stimulation of Tat transactivation, constructs 
expressing mutant P-TEFb subunits are used to assess their effects. The kinase knockout 
mutation of PITALRE is studied to see if it will act as a dominant negative mutation. Most 
importantly, the requirement for the carboxyl-terminal domain of the cyclin subunit for Tat 

25 transactivation is examined. Effects of all interesting P-TEFb constructs on a Tat responsive 
reporter are compared to effects on other promoters, such as the complete CMV promoter, that 
use different activators. These in vivo studies should complement similarly designed in vitro 
experiments described below. 
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EXAMPLE 12 
Examination of the Tat/P-TEFb Interaction. 

An immobilized GST fusion protein containing an intact Tat activation domain is 
5 incubated with fractions around the elution point of PITALRE and extensively washed as was 
done in (see Example 6). The material bound to Tat is assayed for CTD kinase activity using 
RNA polymerase II as substrate (Marshall et aL, 1996) and for the presence of PITALRE by 
immunoblotting. As was done before, negative controls are performed using Tat proteins with 
mutations that neutralize the activation domain. Any complex that fractionates with the ability to 

10 bind to Tat is monitored by immunoprecipitation with anti-PITALRE antibodies to identify 
subunits/cofactors that are needed for association with Tat. A PITALRE containing complex that 
binds to Tat may be purified in this manner and examined for its ability to support efficient Tat 
transactivation. For example, the recombinant PITALRE/human cyclin HBL1-1 and 
PITALRE/HBL3 complexes can be examined for their ability to interact with Tat and compared 

15 to the complexes derived from fractionation of HeLa nuclear proteins. 

If a purified P-TEFb complex that interacts with Tat and recombinant PITALRE/cyclin 
complexes do not interact, the requirement of cofactors that do not co-purify with P-TEFb is 
likely. To find such cofactors an assay in which the association of recombinant PITALRE/cyclin 

20 complexes (rP-TEFb) with Tat is used to measure function and activity in the presence of 
fractions of HeLa nuclear extract. The association is monitored with either a CTD kinase assay 
or by immunoblot. Initially, the assay utilizes HeLa nuclear extract that was depleted of 
PITALRE in the presence of 1 M KC1. This type of extract should not be depleted of factors that 
interact ionically with PITALRE. An increase in the association of rP-TEFb with Tat in the 

25 presence of the depleted extract suggests the presence of cofactors. If such results are obtained, 
the association reactions are modified to learn if the cofactors interact with Tat or with rP-TEFb 
alone. First, immobilized Tat is incubated with the depleted extract. To determine if the 
cofactor(s) interact with Tat, the beads are washed and then incubated with rP-TEFb. If the 
affinity of the beads for P-TEFb increases then Tat associated cofactors are suggested. A similar 

30 experiment in which rP-TEFb is immobilized via its HIS-tag, incubated with the extract and then 
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incubated with Tat shows if cofactor(s) associate with P-TEFb. In the latter experiment the 
binding of Tat is monitored by immunoblotting with anti-Tat antibodies. It is possible that some 
cofactors interact with both factors, but it is also possible that a cofactor interacts with just one 
factor causing a conformational change that enables that factor to associate with the other partner 
5 in the Tat/P-TEFb complex. 

The simplest model for the function of TAR RNA is that it exerts its effect on 
transcription of the HIV-LTR due to its ultimate ability to recruit P-TEFb to the early elongation 
complex. Tat has been shown by a number of groups to bind to TAR (Fong et al, 1997; Aboul- 
10 ela et ai 9 1995; Neenhold and Rana, 1995; Rhim and Rice, 1994; Cullen, 1991) and P-TEFb or 
then unspecified CTD kinases have been shown to associate with Tat in vitro (Herrmann and 
Rice, 1993; Herrmann and Rice, 1995; Chun and Jeang, 1996) and in vivo (Yang et al 9 1996). 
However, linkage between TAR and P-TEFb or any CTD kinase has not been made. 

15 TAR RNA is produced in vitro using T7 polymerase and is included in various binding 

reactions like those described above. First, it is determined if the RNA has an effect on the 
efficiency of interaction of P-TEFb with Tat in the crude extract. TAR RNA is titrated into 
binding reactions that have otherwise been optimized for association of P-TEFb to Tat. 
Interaction efficiency is monitored by determining the fraction of PITALRE remaining in the 

20 extract after the Tat pull-down. The assay is then performed on any promising set of fractions 
identified from the binding studies above. It is possible that TAR RNA may disrupt the 
interaction of Tat with P-TEFb. Although at first this may seem inappropriate, it is important to 
note that Tat may be a component of the preinitiation complex and may not need to be recruited 
by TAR (Garcia-Martinez et aL, 1997a). TAR may regulate the effect of Tat by disrupting the 

25 interaction between Tat and P-TEFb, thereby allowing P-TEFb to function. What ever the 
results, the development of models of Tat transactivation are greatly benefit from knowing the 
effect of TAR RNA on the Tat/P-TEFb interaction. 

If the results indicate that TAR enhances the association of P-TEFb to Tat then an assay 
30 in which TAR RNA is immobilized to paramagnetic particles is used. Binding of P-TEFb to the 
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beads dependent on the presence of Tat and crude extract or fractions generated from an extract 
is followed. The complexes are examined for their ability to increase the activity of associated P- 
TEFb. Specific activity of P-TEFb is monitored by a combination of western blotting, to 
determine relative amount of the factor, and CTD kinase assays, to measure the activity of the 
factor. If the results indicate that TAR disrupts the interaction of P-TEFb and Tat, that 
information is useful in interpreting the outcomes of the studies described below. 

EXAMPLE 13 
Elucidation of the Mechanism of Tat Transactivation. 

Progress on understanding the how Tat causes an increase in the number of polymerase 
molecules that synthesize full length HIV transcripts has been slow until now because of the lack 
of understanding of the elongation control process in general. Recent advancements in 
elongation control include the present identification of Drosophila P-TEFb as a CTD kinase (see 
also Marshall et al, 1996), the purification and characterization of Drosophila factor 2, an ATP 
dependent RNA polymerase II termination factor (Xie and Price, 1996), and the purification of a 
negative human elongation factor comprised of the homologues of the yeast SPT 4 and SPT5 
proteins. The present identification of the kinase subunit of human P-TEFb and the requirement 
for P-TEFb in Tat transactivation makes it clear that studies on the mechanism of Tat 
transactivation should be focused on understanding the role of P-TEFb. 

A. Association of Elongation Control Factors with the Transcriptional Machinery, 

The association of known elongation control factors with transcription complexes on the 
HIV-LTR is studied to determine the effect of Tat on this association. The present immobilized 
template technology (see also Marshall et al 9 1996; Xie and Price, 1996; Marshall and Price, 
1992) and is used to determine if and when P-TEFb, factor 2, the SPT4/SPT5 complex, and other 
factors become associated with the transcription complex. Preinitiation complexes, early 
elongation complexes before TAR RNA is synthesized and elongation complexes containing 
TAR RNA are examined. The salt concentration used during washing steps and the exact 
transcription conditions used are important factors in these studies. The strategy outlined below 
is given as an example of the types of studies that are done. 
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The overall design of the association experiments is to form preinitiation complexes on 
an HIV-LTR template immobilized to paramagnetic beads, allow the polymerase to progress to 
the appropriate stage and then wash the complexes to remove non-associated proteins. 
5 Antibodies against the factors are used to probe for their presence in the isolated complexes. For 
this type of experiment to be successful a number of controls must be done. It is important to 
first determine the efficiency of initiation of preinitiation complexes because uninitiated 
polymerases would contribute to the signals obtained from immunoblotting of "elongation 
complexes". Although uninitiated polymerases could be eliminated with a 1 M salt wash, such a 
10 wash would also potentially disrupt factors in a potential elongation control particle. It cannot be 
assumed that a pulse of NTPs allows all preinitiation complexes to initiate and form early 
elongation complexes. In the Drosophila system the inclusion of Mn"^ instead of Mg""" in the 
% initiation phase increased the rate of initiation and, therefore, the fraction of polymerases that 
'fz initiate. Even in the presence of Mn^" less than half of the polymerases in preinitiation 
K 15 complexes initiated, as evidenced by resistance to a 1 M salt wash (Marshall et al 9 1996). The 
m initiation efficiency in a HeLa nuclear extract system was examined and found that Mn ++ does 
not stimulate the rate of initiation. This suggests that initiation is already quite efficient. To 
H determine what fraction of polymerases initiate from the HIV-LTR in the presence and absence 
of Tat, immunoblotting with antibodies to RNA polymerase II is used. Preinitiation complexes 
are formed in the crude extract and then the complexes are washed extensively with buffer 
containing the highest salt possible that does not cause loss of polymerases that can initiate. 
These washed preinitiation complexes are the starting material for an experiment to determine 
the efficiency of initiation. The amount of RNA polymerase II in these complexes assayed by 
immunoblotting are compared to the amount after initiation and after washing with 1 M salt. The 
25 NTP concentrations, salt conditions and times to effect the greatest efficiency of initiation are 
varied as is practical. If the efficiency is greater than about 50% then the signals obtained by 
immunoblotting are interpretable. 

Once good initiation conditions are are established then the conditions that allow the 
30 efficient formation of early elongation complexes containing nascent transcripts of between 10 
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and 30 nucleotides in length and complexes containing an intact TAR RNA stem and loop are 
determined. The shorter complexes would have made the transition from abortive initiation into 
elongation and, therefore, would have released contact with the promoter localized factors, but 
would not have synthesized TAR RNA. These conditions should be similar to those used to 
5 examine initiation from the HIV-LTR . To obtain the complexes with complete TAR RNA, 
limiting amounts of nucleotides are used to advance the polymerase to a set of positions that 
allow most complexes to have RNA in the 70 to 100 nucleotide range. 

Alternatively, a second method to generate complexes with RNA of more uniform length 
10 is utilized such that the lactose (lac) repressor protein and a template containing the lac repressor 
binding site is positioned downstream of the TAR region to block elongation by the polymerase 
(Keen et aL, 1996). The length of the RNA is monitored using incorporation of oc- 32 P-CTP and 
initiation efficiency is checked as described above. If a significant number of complexes are 
terminating due to the action of factor 2, then antibodies are used against the newly cloned 
15 human factor to deplete the factor from extracts before assembly of the preinitiation complexes. 
Before this is done the effect of such depletion on Tat transactivation is examined. 

After finding the appropriate conditions to efficiently generate the three types of 
complexes, immunoblotting studies are performed with all antibodies available for the factors 

20 potentially involved in elongation control. These include, antibodies against PITALRE and the 
cyclin subunit(s) of P-TEFb, human factor 2, SUPT4H (Chiang et aL, 1996b) and SUPT5H 
(Chiang et aL, 1996a), subunits of TFIIH including those in the CAK (Morgan, 1995b), and both 
subunits of TFIIF (Finkelstein et aL, 1992). Tat is added at different times and its association 
with the complexes is monitored with antibodies (Brake et aL, 1990). Initially, elongation 

25 complexes are generated in the presence of the extract and, therefore, in the presence of all 
factors. As the studies progress this is modified by washing the preinitiation complexes or early 
elongation complexes without TAR and then allowing the polymerases to advance to the next 
complex. This allows determination of what factors are lost from earlier complexes. 
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Although the simplest model for Tat action involves its binding to TAR and recruitment 
of P-TEFb, this model is complicated by several studies that indicate that Tat can associate with 
RNA polymerase II and can become part of the preinitiation complex (Garcia-Martinez et aL, 
1997a; Mavankal et aL, 1996; Cujec et aL, 1997). Using carefully controlled conditions, 
5 attempts are made to confirm that Tat associates with preinitiation complexes and extend the 
findings to other promoters such as the CMV promoter that is used as a control for all the 
experiments described above. Association of Tat with the preinitiation complex helps to explain 
why Tat has an effect on initiation, but does not explain the effect of Tat on elongation. It is 
possible that TAR RNA from an early elongation complex reaches back into the next 
10 preinitiation complex and affects the Tat RNA polymerase II interaction. It is also possible that 
Tat bound to TAR recruits an RNA polymerase II complex (holoenzyme) to the promoter. In 
either case the polymerase initiating under the influence of Tat might be more susceptible to the 
action of P-TEFb. 

1 5 B. Functional Enhancement of P-TEFb Action by Tat 

It is clear from in vivo and in vitro studies that Tat has the ability to enhance the function 
of P-TEFb at the HIV-LTR. What is not clear is how this enhancement is accomplished. P- 
TEFb is not used only at the HIV-LTR, but rather is a cellular factor involved in elongation 
control at most promoters. Also the LTR does not absolutely require Tat to produce full length 
20 transcripts because if this were not the case mRNAs encoding Tat would not be produced. 

One model consistent with most data is that the HIV-LTR is different compared to other 
promoters in that elongation complexes produced from the LTR are especially refractory to P- 
TEFb action. Tat may be required to eliminate a strong negative elongation potential imparted to 

25 polymerases that initiate from the HIV-LTR. This idea is supported by the effect of a number of 
P-TEFb inhibitors that blocked Tat transactivation. Although all of the inhibitors blocked Tat 
transactivation at lower concentrations than that needed to block activation by another activator 
on another promoter, at one log higher concentration all the inhibitors had cytotoxic effects, 
presumably due to inhibition of the normal function of P-TEFb. In transient transfection assays, 

30 increasing PITALRE concentration and presumably P-TEFb activity in a variety of cells was 
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able to stimulate reporter constructs driven by a number of promoters with different activators. 
While Tat transactivation on the HIV-LTR was dramatically stimulated, increasing PITALRE 
did not increase expression from the LTR in the absence of Tat. This supports the idea that the 
LTR is refractory to P-TEFb action and suggests that the increased sensitivity of Tat 
5 transactivation to the inhibitors might arise due to an increase in activity that counters the action 
of P-TEFb. 

There are several possibilities for activities that might counter P-TEFb at the HIV-LTR. 
One such activity might be the CTD phosphatase described by Dahmus (Dahmus, 1996; 

10 Chambers et aL, 1995) and others (Chambers and Kane, 1996). If this is the case an amount of 
inhibitor that would only partially block P-TEFb might have a large effect on the function of the 
factor because of the balance of the two opposing activities. Another possibility is the FBI-1 
protein (Pessler et aL, 1997) that may be responsible for the function of the 1ST (inducer of short 
transcripts) element (Sheldon et aL, 1993) found in the DNA around the start point of 

1 5 transcription of the HIV-LTR. 

The negative activity of the HIV-LTR is first produced to compare to the CMV promoter 
with a co-transfection/reporter system in HeLa cells and then to determine the cause of the effect. 
Ultimately, it is determined if the 1ST element is involved by examining the sensitivity of the 

20 expression of a CAT reporter activity from promoter constructs with and without the element 
(Pendergrast and Hernandez, 1997; Sheldon et aL, 1993). Cells are first transfected with the 
wildtype LTR reporter construct and either a Tat expression plasmid or a mock plasmid. The 
cells are plated into microtitre plate wells in duplicate and the dilutions of the compounds DRB 
or TRB in 100% DMSO are added into the wells to maintain a constant 1% DMSO. Cells are 

25 harvested and assayed for chloramphenicol acetyl transferase after 24 hours. General 
cytotoxicity assays are used to monitor cell death as described in Mancebo et aL Once the 
special sensitivity of the HIV-LTR is produced then the same assays using constructs in which 
the 1ST has been deleted are used. If the 1ST is what makes the LTR different from other 
promoters then the level of drug sensitivity decreases upon deletion of the element and the LTR 

30 minus 1ST looks more like a normal promoter. These experiments are carried out in HeLa cells 
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that exhibit a low but reasonable level of Tat transactivation and in Jurkat cells that exhibit a 
much greater stimulation by Tat. If the 1ST element is the key for the uniqueness of the HIV- 
LTR then the FBI-1 protein characterized is a prime candidate for the factor responsible. 

5 While the in vivo studies are underway the effect of the 1ST element is examined in vitro. 

Tat transactivation of the HIV-LTR is examined as in Example 6. First it is determined if the 
50% inhibition point for the appearance of runoff from the LTR is different from that seen for 
other promoters such as the CMV promoter. If the difference is produced in vitro then LTR 
constructs are used with and without the 1ST element to probe for differences in the response to 
1 0 the inhibitors of P-TEFb. 

C Development of In vitro Assays that Allow Precise Determination of the Effect of Tat. 

H The study of Tat transactivation in vitro has been hampered by an inability to obtain 

'fz. significant stimulation of the elongation in a pulse chase assay designed to separate initiation 

j ~15 from elongation. It has been the rule rather than the exception to use unusual conditions to 

m obtain significant stimulation of DRB sensitive transcripts from the HIV-LTR in vitro. In some 

~ labs a 30 minute "presynthesis" step was used (Marciniak and Sharp, 1991) and in others the 

jZ inclusion of citrate (Parada and Roeder, 1996) was needed to see significant Tat transactivation. 

M The inventor has found that a 20 minute continuous labeling assay in which the NTP 

jrpO concentrations were lowered to 50 [iM was sufficient to see a 12 fold stimulation. 

The first step is to produce extracts from Jurkat cells, a B cell line that can support very 
high levels of Tat transactivation. The standard nuclear extract preparation described previously 
(Price et al, 1987) that has yielded very active extracts containing all the factors necessary for 
25 the efficient production of DRB-sensitive runoff transcripts is used. Cells are grown in 
suspension with high levels of fetal calf serum and are harvested in log growth. The extract is 
optimized for Tat transactivation with standard extract and template titrations as well as a 
condition search that includes titrations of mono- (KC1) and divalent (Mg^ and Zn ++ ) cations. 
The inventor has found that the source of Tat is important. A Jurkat cell line stably expressing 
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wildtype Tat and a Tat cys22gly mutation (Zauli et aL, 1995) is also useful. Extracts made from 
these cells should have native Tat that is active as long as the activation domain has no 
mutations. The negative control for in vitro transcription reactions is to use the mutant extract 
normalized with the CMV promoter or the HIV-LTR template lacking the TAR element. 

5 

If Tat transactivation is found to be robust in the newly derived system, assays are used 
that allow more insight into the step or steps affected by Tat. A pulse chase reaction is used but 
otherwise the conditions are kept similar to the continuous labeling conditions that normally 
work. Pulse reactions are long at first and then are shortened. Care is taken to determine nascent 
10 transcript length so that it can be determined if nascent transcripts need to contain TAR RNA or 
if generation of TAR RNA during the chase is sufficient. The LTR construct containing the lac 
repressor block at +200 described above is used to generate elongation complexes containing 
TAR RNA during the pulse and then IPTG is added at various times during the chase to observe 
the kinetic parameters of functional TAT/TAR interaction. 

15 

Data is consistent with Tat having an effect on initiation that is manifest as an effect on 
elongation during the continuous labeling assay . It is possible that Tat affects reinitiation of 
other polymerase molecules at promoters just used. This hypothesis is examined by using a 
multiple round assay that allows counting of the number of polymerases that initiate from a 

20 single promoter during the course of the reaction. Basically, the lac repressor is used to block 
polymerases that have entered productive elongation and have progressed about 700 bp into the 
template. The second polymerase that initiates from the same promoter is stopped about 30 bp 
earlier due to steric interference with the existing blocked polymerase (Szentirmay and 
Sawadogo, 1993). The length of the 30 bp ladder indicates the number of polymerases that 

25 reinitiate. IPTG allows all appropriately blocked polymerases to run off the end of the template 
at 800 bp. The reinitiation assay can be run under identical conditions to the continuous labeling 
conditions that have been shown to give reasonable Tat transactivation. If evidence of 
reinitiation being stimulated by Tat is seen then it is conclusive and dictates a path of research 
designed to probe the reinitiation. If there is no evidence of reinitiation, but Tat is seen to work 
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well, then reinitiation of transcription by Tat is ruled out as a major mechanism of Tat 
transactivation. 

EXAMPLE 14 

5 Tat Transactivation in Yeast 

Both subunits of human P-TEFb are first transfected into yeast using a centromere 
plasmid with weak promoters driving the human cDNAs. The production of P-TEFb is followed 
with immunoblotting. It is first determined if human P-TEFb replaces the yeast CTD kinase, 

10 CTD Kl, that has been recently demonstrated to be able to stimulate long transcripts in a HeLa 
extract system after inhibition of P-TEFb by DRB (Lee and Greenleaf, 1997; Sterner et aL, 1995; 
Lee and Greenleaf, 1992). Yeast lacking expression of the kinase subunit, CTK 1, are viable but 
have a cold sensitive phenotype and other growth defects. It is next determined if human P- 
TEFb reverses these effects in a CTK 1 null. If P-TEFb causes more normal growth, it is 

15 possible that CTD Kl is the functional equivalent of P-TEFb in yeast. Yeast expressing human 
P-TEFb is transfected with a construct containing the sequences from +1 to +80 of the HIV-LTR 
coupled to a standard yeast promoter CYC1. The hybrid promoter is used to drive a p- 
galactosidase reporter. The effect of Tat is determined by including a Tat expression cassette in 
the p-galactosidase reporter construct. The yeast system is then used to examine the interaction 

20 between human P-TEFb and Tat. 

Alternatively, if Tat transactivation is not attainable, the yeast system may still be used in 
examining the interaction between Tat and P-TEFb if the biochemical approach suggests that the 
two proteins interact. A two hybrid system is set up with a Gal 4-Tat fusion protein as bait. The 
25 human P-TEFb expression plasmid is modified such that the kinase carries a knockout mutation 
and the cyclin subunit has an activation domain attached at the carboxyl-terminus. The reporter 
containing Gal 4 binding sites should be activated by the P-TEFb expression plasmid. If so, then 
deletions of the large subunit are carried out to see if the interaction assayed by reporter activity 
is lost. These studies are compared to studies that examine the ability of truncated versions of P- 
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TEFb to function in general elongation control in vitro and in Tat transactivation in vitro and 
in vivo. 

* * * 

5 

All of the compositions and methods disclosed and claimed herein can be made and 
executed without undue experimentation in light of the present disclosure. While the 
compositions and methods of this invention have been described in terms of preferred 
embodiments, it will be apparent to those of skill in the art that variations may be applied to the 

10 compositions and methods and in the steps or in the sequence of steps of the method described 
herein without departing from the concept, spirit and scope of the invention. More specifically, it 
will be apparent that certain agents which are both chemically and physiologically related may be 
substituted for the agents described herein while the same or similar results would be achieved. 
All such similar substitutes and modifications apparent to those skilled in the art are deemed to 

1 5 be within the spirit, scope and concept of the invention as defined by the appended claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Price, David H. 

(ii) TITLE OF INVENTION: P-TEFb COMPOSITIONS , METHODS AND 
SCREENING ASSAYS 

(iii) NUMBER OF SEQUENCES: 68 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Arnold, White & Durkee 

(B) STREET: P.O. Box 4433 

(C) CITY: Houston 

(D) STATE: TX 

(E) COUNTRY: USA 

(F) ZIP: 77210-4433 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US Unknown 

(B) FILING DATE: Concurrently Herewith 

(C) CLASSIFICATION: Unknown 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Fussey, Shelley P.M. 

(B) REGISTRATION NUMBER: 39,458 

(C) REFERENCE/DOCKET NUMBER: IOWA: 012 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (512) 418-3000 

(B) TELEFAX: (512) 418-3131 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1457 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 115.. 1326 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 



TGTTGAGTCA ACAGCTGTAG ATACACCAAT TGTTGCCGAT TTCTTTCTTT TCGACTGTCG 60 

GCTTCTCGCG AAACTGTGAT TGTGAAAATT GTACAAATAG AGGCAAATTT AACC ATG 117 

Met 
1 

GCG CAC ATG TCC CAC ATG CTC CAG CAG CCT TCG GGG TCG ACG CCC TCC 165 
Ala His Met Ser His Met Leu Gin Gin Pro Ser Gly Ser Thr Pro Ser 
5 10 15 

AAC GTG GGC TCC AGC TCA TCG CGC ACG ATG TCC CTG ATG GAG AAA CAA 213 
Asn Val Gly Ser Ser Ser Ser Arg Thr Met Ser Leu Met Glu Lys Gin 
20 25 30 

AAG TAC ATC GAG GAC TAC GAC TTT CCC TAC TGC GAC GAG AGC AAC AAA 261 
Lys Tyr lie Glu Asp Tyr Asp Phe Pro Tyr Cys Asp Glu Ser Asn Lys 
35 40 45 

TAC GAA AAG GTG GCG AAA ATT GGC CAA GGC ACC TTC GGA GAG GTT TTT 3 09 

Tyr Glu Lys Val Ala Lys He Gly Gin Gly Thr Phe Gly Glu Val Phe 
50 55 60 65 

AAG GCT CGC GAG AAA AAG GGC AAC AAG AAG TTT GTG GCC ATG AAG AAG 357 
Lys Ala Arg Glu Lys Lys Gly Asn Lys Lys Phe Val Ala Met Lys Lys 
70 75 80 

GTG CTG ATG GAC AAC GAA AAG GAG GGC TTT CCC ATC ACG GCT CTG CGA 405 
Val Leu Met Asp Asn Glu Lys Glu Gly Phe Pro He Thr Ala Leu Arg 
85 90 95 

GAG ATC CGC ATC CTG CAG CTG CTA AAG CAC GAG AAC GTG GTG AAT CTG 453 
Glu He Arg He Leu Gin Leu Leu Lys His Glu Asn Val Val Asn Leu 
100 105 110 

ATC GAG ATC TGC CGC ACC AAG GCC ACC GCC ACG AAT GGT TAC AGA TCC 501 
He Glu He Cys Arg Thr Lys Ala Thr Ala Thr Asn Gly Tyr Arg Ser 
115 120 125 

ACC TTC TAT TTG GTC TTT GAT TTC TGC GAA CAC GAT TTG GCA GGT CTT 549 
Thr Phe Tyr Leu Val Phe Asp Phe Cys Glu His Asp Leu Ala Gly Leu 
130 135 140 145 

CTG TCC AAC ATG AAC GTC AAG TTC AGT CTG GGC GAG ATT AAG AAG GTT 597 
Leu Ser Asn Met Asn Val Lys Phe Ser Leu Gly Glu He Lys Lys Val 
150 155 160 

ATG CAG CAG CTT TTA AAC GGT TTG TAT TAC ATC CAC AGC AAC AAG ATC 645 
Met Gin Gin Leu Leu Asn Gly Leu Tyr Tyr He His Ser Asn Lys He 
165 170 175 
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CTG CAC CGA GAC ATG AAA GCT GCC AAC GTG CTG ATT ACC AAG CAT GGC 693 
Leu His Arg Asp Met Lys Ala Ala Asn Val Leu He Thr Lys His Gly 
180 185 190 

ATC TTA AAG CTG GCT GAC TTT GGC TTG GCC CGT GCT TTT AGC ATT CCA 741 
He Leu Lys Leu Ala Asp Phe Gly Leu Ala Arg Ala Phe Ser He Pro 
195 200 205 

AAG AAC GAG AGT AAG AAT CGC TAT ACC AAT CGC GTA GTA ACC TTG TGG 789 
Lys Asn Glu Ser Lys Asn Arg Tyr Thr Asn Arg Val Val Thr Leu Trp 
210 215 220 225 

TAC CGG CCG CCT GAG CTG CTA CTT GGT GAC CGC AAC TAT GGT CCA CCC 837 
Tyr Arg Pro Pro Glu Leu Leu Leu Gly Asp Arg Asn Tyr Gly Pro Pro 
230 235 240 

GTG GAC ATG TGG GGA GCC GGC TGC ATA ATG GCC GAG ATG TGG ACA CGC 885 
Val Asp Met Trp Gly Ala Gly Cys He Met Ala Glu Met Trp Thr Arg 
245 250 255 

TCG CCC ATC ATG CAA GGC AAT ACG GAG CAG CAG CAG TTA ACC TTT ATT 933 
Ser Pro He Met Gin Gly Asn Thr Glu Gin Gin Gin Leu Thr Phe He 
260 265 270 

TCG CAG CTA TGC GGC TCC TTT ACG CCG GAC GTG TGG CCG GGA GTG GAG 981 
Ser Gin Leu Cys Gly Ser Phe Thr Pro Asp Val Trp Pro Gly Val Glu 
275 280 285 

GAG CTG GAG CTG TAC AAA TCC ATC GAG CTG CCA AAG AAC CAG AAG CGT 1029 
Glu Leu Glu Leu Tyr Lys Ser He Glu Leu Pro Lys Asn Gin Lys Arg 
290 295 300 305 

CGA GTC AAG GAG CGC CTG CGT CCG TAT GTC AAG GAT CAA ACC GGC TGT 1077 
Arg Val Lys Glu Arg Leu Arg Pro Tyr Val Lys Asp Gin Thr Gly Cys 
310 315 320 

GAT CTA TTG GAC AAA TTG CTG ACC CTT GAT CCC AAG AAA CGC ATC GAT 1125 
Asp Leu Leu Asp Lys Leu Leu Thr Leu Asp Pro Lys Lys Arg He Asp 
325 330 335 

GCG GAC ACA GCT CTG AAT CAC GAC TTC TTC TGG ACG GAT CCC ATG CCC 1173 
Ala Asp Thr Ala Leu Asn His Asp Phe Phe Trp Thr Asp Pro Met Pro 
340 345 350 

AGC GAC TTG AGC AAG ATG CTG TCC CAG CAC CTG CAG AGC ATG TTC GAG 1221 
Ser Asp Leu Ser Lys Met Leu Ser Gin His Leu Gin Ser Met Phe Glu 
355 360 365 

TAC CTG GCG CAG CCA CGC CGC AGC AAC CAG ATG CGC AAC TAT CAC CAG 1269 
Tyr Leu Ala Gin Pro Arg Arg Ser Asn Gin Met Arg Asn Tyr His Gin 
370 375 380 385 
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CAA CTG ACC ACC ATG AAC CAG AAG CCC CAG GAC AAC AGT ATG ATT GAC 1317 
Gin Leu Thr Thr Met Asn Gin Lys Pro Gin Asp Asn Ser Met lie Asp 
390 395 400 

CGG GTT TGG TAGACTGCCA GAGGTGTACG CACCCGACTA ATAGTTTCTC 1366 
Arg Val Trp 

ACCTTCAACT AGCGTTAGGT TATTAGGTTA GTGTACAATA AAAATATTGG CATTTGCATT 1426 

AGCGCTTGCT CCAAATATAA AAAAAAAAAA A 1457 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 404 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Ala His Met Ser His Met Leu Gin Gin Pro Ser Gly Ser Thr Pro 
X 5 10 15 

Ser Asn Val Gly Ser Ser Ser Ser Arg Thr Met Ser Leu Met Glu Lys 
20 25 30 

Gin Lys Tyr He Glu Asp Tyr Asp Phe Pro Tyr Cys Asp Glu Ser Asn 
35 40 45 

Lys Tyr Glu Lys Val Ala Lys He Gly Gin Gly Thr Phe Gly Glu Val 
50 55 60 

Phe Lys Ala Arg Glu Lys Lys Gly Asn Lys Lys Phe Val Ala Met Lys 
65 70 75 80 

Lys Val Leu Met Asp Asn Glu Lys Glu Gly Phe Pro He Thr Ala Leu 
85 90 95 

Arg Glu He Arg He Leu Gin Leu Leu Lys His Glu Asn Val Val Asn 
100 105 HO 

Leu He Glu He Cys Arg Thr Lys Ala Thr Ala Thr Asn Gly Tyr Arg 
115 120 125 

Ser Thr Phe Tyr Leu Val Phe Asp Phe Cys Glu His Asp Leu Ala Gly 
130 135 140 

Leu Leu Ser Asn Met Asn Val Lys Phe Ser Leu Gly Glu He Lys Lys 
145 150 155 160 
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Val Met Gin Gin Leu Leu Asn Gly Leu Tyr Tyr He His Ser Asn Lys 
165 170 175 



He Leu His Arg 
180 

Gly He Leu Lys 
195 

Pro Lys Asn Glu 
210 

Trp Tyr Arg Pro 
225 

Pro Val Asp Met 



Arg Ser Pro He 
260 

He Ser Gin Leu 
275 

Glu Glu Leu Glu 
290 

Arg Arg Val Lys 
305 

Cys Asp Leu Leu 



Asp Met Lys Ala 



Leu Ala Asp Phe 
200 

Ser Lys Asn Arg 
215 

Pro Glu Leu Leu 
230 

Trp Gly Ala Gly 
245 

Met Gin Gly Asn 



Cys Gly Ser Phe 
280 

Leu Tyr Lys Ser 
295 

Glu Arg Leu Arg 
310 

Asp Lys Leu Leu 
325 



Ala Asn Val Leu 
185 

Gly Leu Ala Arg 



Tyr Thr Asn Arg 
220 

Leu Gly Asp Arg 
235 

Cys He Met Ala 
250 

Thr Glu Gin Gin 
265 

Thr Pro Asp Val 



He Glu Leu Pro 
300 

Pro Tyr Val Lys 
315 

Thr Leu Asp Pro 
330 



He Thr Lys His 
190 

Ala Phe Ser He 
205 

Val Val Thr Leu 



Asn Tyr Gly Pro 
240 

Glu Met Trp Thr 
255 

Gin Leu Thr Phe 
270 

Trp Pro Gly Val 
285 

Lys Asn Gin Lys 



Asp Gin Thr Gly 
320 

Lys Lys Arg He 
335 



Asp Ala Asp Thr Ala Leu Asn His 
340 

Pro Ser Asp Leu Ser Lys Met Leu 
355 360 

Glu Tyr Leu Ala Gin Pro Arg Arg 
370 375 

Gin Gin Leu Thr Thr Met Asn Gin 
385 390 



Asp Phe Phe Trp Thr Asp Pro Met 
345 350 

Ser Gin His Leu Gin Ser Met Phe 
365 

Ser Asn Gin Met Arg Asn Tyr His 
380 

Lys Pro Gin Asp Asn Ser Met He 
395 400 



Asp Arg Val Trp 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4328 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
CAGCCCTGCC GACGGCCATA CTTGAAAATA CATTTTTTTC TGCAAAGTTT GTCATTGTCA 60 

CTGTGTGAAT GGAATCTGTG ATGTGTTGTG GAATTAAAAA CGTCAAGTAA ACAACCCGTA 120 

ATGGTTAAAG TGCACGGCGA AAGCAGTGCG AATAACTATG AATTGATACA AAAGTTGCAT 180 

AACACGTCGC CTGGTGTCGC GGTTAGTGTG TTTTTCGTCT CGTTTCGTTT CCGCCGCAGT 24 0 

CGCAGTTTCC AAAAAACCTC ACCACACCAT ACCATCTCCA CCACGCACAC ACACACACAA 300 

ACAAACACGC AGAGACGCGG CGGCGGAAAA AGTGTGCGGA CCGCGGATTT AACCCCTCGT 36 0 

TCCAAACCCA AATTGGAGTC TCCCAAAAAC AGCGAAATAT CGAGTGTGGC TTAGCCGATG 420 

TGCCGTGCGA TCCCCACTGC CCCTTCCGTA CCGCTGCCAC CCCCGCCACA GCAGCAACGC 480 

ACACGGATAC GGACACAGAC ACCAATACCA GCGCACTCAA GCACGGCCGA CAAAGAAAGA 540 

GCGCTCTCCC TTCCTCTTTG TACAGTTAGT TCCTACAGCT GAATCAGCCA AAAGAAATTA 600 

CTAGGTCCAT TCCGAGGCGC AGTTTGCATG TGAAACGGAG GTCCCCGCAT AACCACGCGG 660 
AACCCGAAAT TCCAGATCCC CATCTCCGCT GCACGGATAA AGGAAACATA CAACCATGAG 720 
TCTCCTAGCC ACGCCAATGC CCCAGGCGGC CACCGCCTCA TCTTCTTCAT CCGCCTCCGC 780 
GGCCGCCTCG GCCAGCGGGA TTCCAATCAC CGCCAACAAC AACCTGCCTT TCGAGAAGGA 840 
CAAGATCTGG TACTTCAGCA ACGATCAGCT GGCCAATTTG CCAAGCAGAA GATGCGGCAT 900 
CAAGGGCGAC GATGAGCTGC AGTACCGCCA GATGACCGCC TATCTGATAC AGGAAATGGG 960 

TCAGCGTCTG CAGGTGTCCC AACTGTGCAT CAACACGGCC ATTGTGTACA TGCATCGGTT 1020 

CTACGCCTTT CACTCCTTCA CCCACTTTCA TCGCAACTCC ATGGCGTCGG CGAGCCTCTT 1080 

CTTGGCCGCC AAGGTAGAAG AGCAACCGCG GAAGCTGGAG CATGTTATTC GGGCCGCCAA 1140 

CAAGTGCCTG CCGCCGACCA CCGAGCAGAA TTACGCCGAA CTCGCCCAGG AGCTTGTGTT 12 00 

CAACGAGAAC GTGCTCCTGC AGACGCTGGG CTTCGATGTG GCCATCGATC ATCCGCACAC 1260 

GCATGTGGTG CGCACCTGCC AGCTGGTCAA AGCATGCAAG GATCTGGCGC AGACATCGTA 1320 

CTTCTTGGCC TCGAACAGCC TGCATCTGAC CTCGATGTGC CTCCAATATC GCCCCACGGT 1380 

CGTAGCCTGT TTCTGCATTT ACCTAGCCTG CAAGTGGTCC CGATGGGAGA TCCCCCAGTC 1440 

GACCGAGGGC AAGCACTGGT TCTACTATGT GGACAAGACG GTCTCGCTGG ATTTGCTAAA 1500 
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GCAGCTGACA GATGAGTTCA TCGCTATCTA TGAGAAGAGC CCGGCCCGTC TGAAGTCTAA 1560 

GCTTAACTCG ATCAAGGCGA TCGC CCAGGG AGCCAGCAAT CGGACAGCTA ACAGCAAGGA 1620 

CAAACCAAAG GAGGACTGGA AGATCACCGA GATGATGAAG GGCTACCACT CAAACATCAC 1680 

GACACCACCA GAGCTGTTAA ACGGCAACGA CAGCCGGGAT CGGGACCGAG ATCGTGAACG 1740 

GGAGAGAGAG CGGGAACGGG ATCCGTCGTC ACTACTGCCG CCACCGGCTA TGGTGCCGCA 1800 

GCAAAGACGA CAGGATGGTG GACATCAGCG CTCGTCCTCA GTGAGCGGAG TGCCAGGCAG 1860 

CAGCTCTTCG TCGTCTTCCT CCAGTCACAA GATGCCAAAT TACCCTGGTG GCATGCCGCC 1920 

CGAAGCTCAT CCGGATCACA AGTCAAAGCA GCCGGGCTAT AACAATCGAA TGCCCTCAAG 1980 

TCACCAGCGT AGTAGTAGCA GTGGACTCGG TTCCTCGGGA AGTGGCAGCC AGCACAGCAG 2040 

CTCATCCTCG TCGTCTTCAA GCCAGCAGCC TGGCCGACCG TCTATGCCCG TGGACTATCA 2100 

CAAATCCTCT CGCGGCATGC CGCCGGTAGG CGTGGGCATG CCACCTCACG GGAGCCACAA 2160 

GATGACTTCG GGCTCCAAGC CTCAACAGCC GCAGCAGCAG CCGGTCCCAC ATCCATCCGC 2220 

CTCTAATTCC TCTGCATCGG GCATGTCCTC CAAGGATAAA TCCCAGAGCA ACAAAATGTA 2280 

TCCGAACGCA CCGCCGCCAT ACAGTAATAG TGCCCCTCAA AACCCGCTGA TGTCGCGTGG 2340 

TGGATATCCA GGCGCTAGCA ATGGATCCCA GCCCCCGCCT CCCGCCGGAT ACGGCGGCCA 2400 

TCGCAGCAAA TCCGGCTCCA CCGTCCATGG CATGCCGCAT TTCGAGCAGC AATTGCCCTA 2460 

TTCCCAGAGC CAGAGCTACG GCCACATGCA GCAGCAGCCA GTGCCTCAGT CTCAGCAGCA 2520 

ACAGATGCCT CCGGAGGCAT CCCAGCACTC GTTGCAGTCC AAGAACTCGC TCTTCAGTCC 2580 

AGAGTGGCCA GACATTAAAA AGGAGCCCAT GTCGCAGTCG CAACCACAGC TTTTTAACGG 2640 

TTTGCTACCC CCTCCTGCGC CTCCCGGCCA CGATTACAAG CTAAATAGCC ATCCGCGCGA 2700 

CAAAGAAAGT CCCAAGAAAG AGCGACTAAC GCCAACCAAA AAGGATAAGC ACCGTCCTGT 2760 

AATGCCCCCA ATGGGCAGTG GGAACAGTTC CTCCGGCTCG GGATCATCAA AGCCGATGCT 2820 

ACCGCCTCAC AAGAAGCAGA TACCCCATGG CGGGGACCTG TTGACCAATC CTGGAGAGAG 2880 

TGGAAGCCTA AAACGGCCCA ACGAGATCTC GGGAAGTCAG TATGGACTAA ATAAGCTGGA 2940 

TGAAATAGAT AACAGTAATA TGCCTCGAGA AAAGCTTCGC AAGCTGGACA CTACAACTGG 3000 

ACTAC CAACT TATCCGAATT ATGAGGAGAA ACACACGCCT CTGAATATGT CCAACGGAAT 3060 

CGAGACAACG CCGGATCTGG TGCGCAGTTT GCTAAAGGAG AGTCTGTGTC CATCGAACGC 3120 
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TTCGCTCCTG AAACCGGATG CCTTGACTAT GCCTGGCCTG AAACCACCGG CCGAACTACT 3180 

TGAGCCCATG CCCGCACCAG CGACAATCAA GAAAGAACAG GGAATAACTC CGATGACCAG 3240 

TTTGGCTAGT GGGCCCGCAC CCATGGATTT GGAAGTACCC ACTAAACAGG CCGGAGAGAT 3300 

TAAGGAGGAA AGCAGCAGCA AGTCCGAAAA GAAAAAGAAG AAGGATAAAC ACAAACACAA 336 0 

GGAGAAGGAC AAGTCCAAGG ACAAGACGGA AAAGGAGGAG CGTAAGAAGC ACAAGAGGGA 3420 

CAAGCAGAAG GATCGTAGCG GCAGCGGTGG CAGCAAGGAC AGTTCTCTTC CCAATGAGCC 3480 

TCTGAAGATG GTTATCAAGA ATCCCAACGG CAGCCTGCAG GCCGGTGCGT CAGCTCCCAT 3540 

TAAACTTAAG ATCAGCAAAA ATAAGGTTGA ACCCAATAAC TACTCTGCAG CGGCGGGTCT 3S00 

GCCTGGCGCA ATCGGATATG GCTTGCCTCC AACTACGGCT ACCACCACAT CCGCTTCGAT 3660 

CGGAGCAGCT GCTCCTGTTC TGCCTCCTTA TGGTGCCGGC GGTGGTGGCT ACAGCTCATC 3720 

GGGCGGCAGC AGTTCCGGTG GCAGCAGCAA GAAAAAGCAC AGCGATCGTG ACCGCGACAA 3780 

GGAGAGCAAA AAGAATAAGA GCCAAGACTA CGCGAAGTAC AATGGCGCTG GTGGCGGCAT 3840 

CTTTAATCCC CTTGGCGGTG CTGGCGCCGC ACCCAATATG TCTGGAGGAA TGGGCGCCCC 3900 

CATGTCTACT GCTGTACCAC CATCCATGCT GTTGGCGCCC ACCGGTGCAG TACCACCCTC 3960 

TGCCGCTGGG CTGGCACCGC CTCCCATGCC CGTCTACAAC AAGAAGTAGT GGTAGCGGTC 4020 

AGAGGGTTAT TCTTAAGTCG TACGTTTTGA TATATGTATA GAACCTCAGT AAGTCCGATT 4080 

GTAGTATAGT TGTTAGGATT GTTAGTGAGA TGCATTATTG ATTTTAGTTA AGCACATAGA 4140 

TAAAACTCCA AATTGGAAGT GAAACCGGAT GCGCAGATCG AAGAAGAATG GAAGTAGATG 4200 

TCGCGATGGG GCTGGACGTA AAAGCAGTAC TCAAATCGCG AAAACTTTTG TACAGCATTA 4260 

ATTAGTTTAT AACTATAATA AATAGCATAC ATATAAGCCC AAAAAAAAAA AAAAAAAAAA 4320 
AAAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1097 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Met Ser Leu Leu Ala Thr Pro Met Pro Gin Ala Ala Thr Ala Ser Ser 
x 5 10 15 

Ser Ser Ser Ala Ser Ala Ala Ala Ser Ala Ser Gly He Pro He Thr 
20 25 30 

Ala Asn Asn Asn Leu Pro Phe Glu Lys Asp Lys He Trp Tyr Phe Ser 
35 40 45 

Asn Asp Gin Leu Ala Asn Leu Pro Ser Arg Arg Cys Gly He Lys Gly 
50 55 60 

Asp Asp Glu Leu Gin Tyr Arg Gin Met Thr Ala Tyr Leu He Gin Glu 
65 70 75 80 

Met Gly Gin Arg Leu Gin Val Ser Gin Leu Cys He Asn Thr Ala He 
85 90 95 

Val Tyr Met His Arg Phe Tyr Ala Phe His Ser Phe Thr His Phe His 
100 105 HO 

Arg Asn Ser Met Ala Ser Ala Ser Leu Phe Leu Ala Ala Lys Val Glu 
115 120 125 

Glu Gin Pro Arg Lys Leu Glu His Val He Arg Ala Ala Asn Lys Cys 
130 135 140 

Leu Pro Pro Thr Thr Glu Gin Asn Tyr Ala Glu Leu Ala Gin Glu Leu 
145 150 155 160 

Val Phe Asn Glu Asn Val Leu Leu Gin Thr Leu Gly Phe Asp Val Ala 
165 170 175 

He Asp His Pro His Thr His Val Val Arg Thr Cys Gin Leu Val Lys 
180 185 190 

Ala Cys Lys Asp Leu Ala Gin Thr Ser Tyr Phe Leu Ala Ser Asn Ser 
195 200 205 

Leu His Leu Thr Ser Met Cys Leu Gin Tyr Arg Pro Thr Val Val Ala 
210 215 220 

Cys Phe Cys He Tyr Leu Ala Cys Lys Trp Ser Arg Trp Glu He Pro 
225 230 235 240 

Gin Ser Thr Glu Gly Lys His Trp Phe Tyr Tyr Val Asp Lys Thr Val 
245 250 255 

Ser Leu Asp Leu Leu Lys Gin Leu Thr Asp Glu Phe He Ala He Tyr 
260 265 270 
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Glu Lys Ser Pro Ala Arg Leu Lys Ser Lys Leu Asn Ser lie Lys Ala 
275 280 285 



lie Ala Gin Gly Ala Ser Asn Arg Thr Ala Asn Ser Lys Asp Lys Pro 
290 295 300 

Lys Glu Asp Trp Lys lie Thr Glu Met Met Lys Gly Tyr His Ser Asn 
305 310 315 320 

lie Thr Thr Pro Pro Glu Leu Leu Asn Gly Asn Asp Ser Arg Asp Arg 
325 330 335 

Asp Arg Asp Arg Glu Arg Glu Arg Glu Arg Glu Arg Asp Pro Ser Ser 
340 345 350 

Leu Leu Pro Pro Pro Ala Met Val Pro Gin Gin Arg Arg Gin Asp Gly 
355 360 365 

Gly His Gin Arg Ser Ser Ser Val Ser Gly Val Pro Gly Ser Ser Ser 
370 375 380 

Ser Ser Ser Ser Ser Ser His Lys Met Pro Asn Tyr Pro Gly Gly Met 
385 390 395 400 

Pro Pro Glu Ala His Pro Asp His Lys Ser Lys Gin Pro Gly Tyr Asn 
405 410 415 

Asn Arg Met Pro Ser Ser His Gin Arg Ser Ser Ser Ser Gly Leu Gly 
420 425 430 

Ser Ser Gly Ser Gly Ser Gin His Ser Ser Ser Ser Ser Ser Ser Ser 
435 440 445 

Ser Gin Gin Pro Gly Arg Pro Ser Met Pro Val Asp Tyr His Lys Ser 
450 455 460 

Ser Arg Gly Met Pro Pro Val Gly Val Gly Met Pro Pro His Gly Ser 
465 470 475 480 

His Lys Met Thr Ser Gly Ser Lys Pro Gin Gin Pro Gin Gin Gin Pro 
485 490 495 

Val Pro His Pro Ser Ala Ser Asn Ser Ser Ala Ser Gly Met Ser Ser 
500 505 510 

Lys Asp Lys Ser Gin Ser Asn Lys Met Tyr Pro Asn Ala Pro Pro Pro 
515 520 525 

Tyr Ser Asn Ser Ala Pro Gin Asn Pro Leu Met Ser Arg Gly Gly Tyr 
530 535 540 

Pro Gly Ala Ser Asn Gly Ser Gin Pro Pro Pro Pro Ala Gly Tyr Gly 
545 550 555 560 
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Gly His Arg Ser Lys Ser Gly Ser Thr Val His Gly Met Pro His Phe 
565 570 575 



Glu Gin Gin Leu Pro Tyr Ser Gin Ser Gin Ser Tyr Gly His Met Gin 
580 585 590 

Gin Gin Pro Val Pro Gin Ser Gin Gin Gin Gin Met Pro Pro Glu Ala 
595 600 605 

Ser Gin His Ser Leu Gin Ser Lys Asn Ser Leu Phe Ser Pro Glu Trp 
610 615 620 

Pro Asp lie Lys Lys Glu Pro Met Ser Gin Ser Gin Pro Gin Leu Phe 
625 630 635 640 

Asn Gly Leu Leu Pro Pro Pro Ala Pro Pro Gly His Asp Tyr Lys Leu 
645 650 655 

Asn Ser His Pro Arg Asp Lys Glu Ser Pro Lys Lys Glu Arg Leu Thr 
660 665 670 

Pro Thr Lys Lys Asp Lys His Arg Pro Val Met Pro Pro Met Gly Ser 
675 680 685 

Gly Asn Ser Ser Ser Gly Ser Gly Ser Ser Lys Pro Met Leu Pro Pro 
690 695 700 

His Lys Lys Gin He Pro His Gly Gly Asp Leu Leu Thr Asn Pro Gly 
705 710 715 720 

Glu Ser Gly Ser Leu Lys Arg Pro Asn Glu He Ser Gly Ser Gin Tyr 
725 730 735 

Gly Leu Asn Lys Leu Asp Glu He Asp Asn Ser Asn Met Pro Arg Glu 
740 745 750 

Lys Leu Arg Lys Leu Asp Thr Thr Thr Gly Leu Pro Thr Tyr Pro Asn 
755 760 765 

Tyr Glu Glu Lys His Thr Pro Leu Asn Met Ser Asn Gly He Glu Thr 
770 775 780 

Thr Pro Asp Leu Val Arg Ser Leu Leu Lys Glu Ser Leu Cys Pro Ser 
785 790 795 800 

Asn Ala Ser Leu Leu Lys Pro Asp Ala Leu Thr Met Pro Gly Leu Lys 
805 810 815 

Pro Pro Ala Glu Leu Leu Glu Pro Met Pro Ala Pro Ala Thr He Lys 
820 825 830 

Lys Glu Gin Gly He Thr Pro Met Thr Ser Leu Ala Ser Gly Pro Ala 
835 840 845 
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Pro Met Asp Leu Glu Val Pro Thr Lys Gin Ala Gly Glu lie Lys Glu 
850 855 860 



Glu Ser Ser Ser Lys Ser Glu Lys Lys Lys Lys Lys Asp Lys His Lys 
865 870 875 880 

His Lys Glu Lys Asp Lys Ser Lys Asp Lys Thr Glu Lys Glu Glu Arg 
885 890 895 

Lys Lys His Lys Arg Asp Lys Gin Lys Asp Arg Ser Gly Ser Gly Gly 
900 905 910 

Ser Lys Asp Ser Ser Leu Pro Asn Glu Pro Leu Lys Met Val lie Lys 
915 920 925 

Asn Pro Asn Gly Ser Leu Gin Ala Gly Ala Ser Ala Pro lie Lys Leu 
930 935 940 

Lys lie Ser Lys Asn Lys Val Glu Pro Asn Asn Tyr Ser Ala Ala Ala 
945 950 955 960 

Gly Leu Pro Gly Ala lie Gly Tyr Gly Leu Pro Pro Thr Thr Ala Thr 
965 970 975 

Thr Thr Ser Ala Ser lie Gly Ala Ala Ala Pro Val Leu Pro Pro Tyr 
980 985 990 

Gly Ala Gly Gly Gly Gly Tyr Ser Ser Ser Gly Gly Ser Ser Ser Gly 
995 1000 1005 

Gly Ser Ser Lys Lys Lys His Ser Asp Arg Asp Arg Asp Lys Glu Ser 
1010 1015 1020 

Lys Lys Asn Lys Ser Gin Asp Tyr Ala Lys Tyr Asn Gly Ala Gly Gly 
1025 1030 1035 1040 

Gly He Phe Asn Pro Leu Gly Gly Ala Gly Ala Ala Pro Asn Met Ser 
1045 1050 1055 

Gly Gly Met Gly Ala Pro Met Ser Thr Ala Val Pro Pro Ser Met Leu 
1060 1065 1070 

Leu Ala Pro Thr Gly Ala Val Pro Pro Ser Ala Ala Gly Leu Ala Pro 
1075 1080 1085 

Pro Pro Met Pro Val Tyr Asn Lys Lys 
1090 1095 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1119 base pairs 
<B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION: 1..1116 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

ATG GCA AAG CAG TAC GAC TCG GTG GAG TGC CCT TTT TGT GAT GAA GTT 48 
Met Ala Lys Gin Tyr Asp Ser Val Glu Cys Pro Phe Cys Asp Glu Val 
15 10 15 

TCC AAA TAC GAG AAG CTC GCC AAG ATC GGC CAA GGC ACC TTC GGG GAG 96 
Ser Lys Tyr Glu Lys Leu Ala Lys lie Gly Gin Gly Thr Phe Gly Glu 
20 25 30 

GTG TTC AAG GCC AGG CAC CGC AAG ACC GGC CAG AAG GTG GCT CTG AAG 144 
Val Phe Lys Ala Arg His Arg Lys Thr Gly Gin Lys Val Ala Leu Lys 
35 40 45 

AAG GTG CTG ATG GAA AAC GAG AAG GAG GGG TTC CCC ATT ACA GCC TTG 192 
Lys Val Leu Met Glu Asn Glu Lys Glu Gly Phe Pro lie Thr Ala Leu 
50 55 60 

CGG GAG ATC AAG ATC CTT CAG CTT CTA AAA CAC GAG AAT GTG GTC AAC 240 
Arg Glu lie Lys lie Leu Gin Leu Leu Lys His Glu Asn Val Val Asn 
65 70 75 80 

TTG ATT GAG ATT TGT CGA ACC AAA GCT TCC CCC TAT AAC CGC TGC AAG 288 
Leu lie Glu lie Cys Arg Thr Lys Ala Ser Pro Tyr Asn Arg Cys Lys 
85 90 95 

GGT AGT ATA TAC CTG GTG TTC GAC TTC TGC GAG CAT GAC CTT GCT GGG 336 
Gly Ser lie Tyr Leu Val Phe Asp Phe Cys Glu His Asp Leu Ala Gly 
100 105 110 

CTG TTG AGC AAT GTT TTG GTC AAG TTC ACG CTG TCT GAG ATC AAG AGG 3 84 

Leu Leu Ser Asn Val Leu Val Lys Phe Thr Leu Ser Glu lie Lys Arg 
115 120 125 

GTG ATG CAG ATG CTG CTT AAC GGC CTC TAC TAC ATC CAC AGA AAC AAG 432 
Val Met Gin Met Leu Leu Asn Gly Leu Tyr Tyr lie His Arg Asn Lys 
130 135 140 

ATC CTG CAT AGG GAC ATG AAG GCT GCT AAT GTG CTT ATC ACT CGT GAT 48 0 

lie Leu His Arg Asp Met Lys Ala Ala Asn Val Leu lie Thr Arg Asp 
145 150 155 160 

GGG GTC CTG AAG CTG GCA GAC TTT GGG CTG GCC CGG GCC TTC AGC CTG 528 
Gly Val Leu Lys Leu Ala Asp Phe Gly Leu Ala Arg Ala Phe Ser Leu 
165 170 175 
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GCC AAG AAC AGC CAG CCC AAC CGC TAC ACC AAC CGT GTG GTG ACA CTC 576 
Ala Lys Asn Ser Gin Pro Asn Arg Tyr Thr Asn Arg Val Val Thr Leu 
180 185 190 

TGG TAC CGG CCC CCG GAG CTG TTG CTC GGG GAG CGG GAC TAC GGC CCC 624 
Trp Tyr Arg Pro Pro Glu Leu Leu Leu Gly Glu Arg Asp Tyr Gly Pro 
195 200 205 

CCC ATT GAC CTG TGG GGT GCT GGG TGC ATC ATG GCA GAG ATG TGG ACC 672 
Pro lie Asp Leu Trp Gly Ala Gly Cys lie Met Ala Glu Met Trp Thr 
210 215 220 

CGC AGC CCC ATC ATG CAG GGC AAC ACG GAG CAG CAC CAA CTC GCC CTC 720 
Arg Ser Pro lie Met Gin Gly Asn Thr Glu Gin His Gin Leu Ala Leu 
225 230 235 240 

ATC AGT CAG CTC TGC GGC TCC ATC ACC CCT GAG GTG TGG CCA AAC GTG 768 
lie Ser Gin Leu Cys Gly Ser lie Thr Pro Glu Val Trp Pro Asn Val 
245 250 255 

GAC AAC TAT GAG CTG TAC GAA AAG CTG GAG CTG GTC AAG GGC CAG AAG 816 
Asp Asn Tyr Glu Leu Tyr Glu Lys Leu Glu Leu Val Lys Gly Gin Lys 
260 265 270 

CGG AAG GTG AAG GAC AGG CTG AAG GCC TAT GTG CGT GAC CCA TAC GCA 864 
Arg Lys Val Lys Asp Arg Leu Lys Ala Tyr Val Arg Asp Pro Tyr Ala 
275 280 285 

CTG GAC CTC ATC GAC AAG CTG CTG GTG CTG GAC CCT GCC CAG CGC ATC 912 
Leu Asp Leu lie Asp Lys Leu Leu Val Leu Asp Pro Ala Gin Arg lie 
290 295 300 

GAC AGC GAT GAC GCC CTC AAC CAC GAC TTC TTC TGG TCC GAC CCC ATG 960 
Asp Ser Asp Asp Ala Leu Asn His Asp Phe Phe Trp Ser Asp Pro Met 
305 310 315 320 

CCC TCC GAC CTC AAG GGC ATG CTC TCC ACC CAC CTG ACG TCC ATG TTC 1008 
Pro Ser Asp Leu Lys Gly Met Leu Ser Thr His Leu Thr Ser Met Phe 
325 330 335 

GAG TAC TTG GCA CCA CCG CGC CGG AAG GGC AGC CAG ATC ACC CAG CAG 1056 
Glu Tyr Leu Ala Pro Pro Arg Arg Lys Gly Ser Gin lie Thr Gin Gin 
340 345 350 

TCC ACC AAC CAG AGT CGC AAT CCC GCC ACC ACC AAC CAG ACG GAG TTT 1104 
Ser Thr Asn Gin Ser Arg Asn Pro Ala Thr Thr Asn Gin Thr Glu Phe 
355 360 365 

GAG CGC GTC TTC TGA 1119 
Glu Arg Val Phe 
370 
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(2) INFORMATION FOR SEQ ID NO : 6 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 372 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



Met Ala Lys Gin Tyr Asp Ser Val Glu Cys Pro Phe Cys Asp Glu Val 
15 10 15 

Ser Lys Tyr Glu Lys Leu Ala Lys lie Gly Gin Gly Thr Phe Gly Glu 
20 25 30 



Val Phe Lys Ala Arg His Arg Lys Thr Gly Gin Lys Val Ala Leu Lys 
35 40 45 

Lys Val Leu Met Glu Asn Glu Lys Glu Gly Phe Pro lie Thr Ala Leu 
50 55 60 

Arg Glu lie Lys lie Leu Gin Leu Leu Lys His Glu Asn Val Val Asn 
65 70 75 80 

Leu lie Glu lie Cys Arg Thr Lys Ala Ser Pro Tyr Asn Arg Cys Lys 
85 90 95 

Gly Ser lie Tyr Leu Val Phe Asp Phe Cys Glu His Asp Leu Ala Gly 
100 105 110 



Leu Leu Ser Asn Val Leu Val Lys 
115 120 

Val Met Gin Met Leu Leu Asn Gly 
130 135 

lie Leu His Arg Asp Met Lys Ala 
145 150 

Gly Val Leu Lys Leu Ala Asp Phe 
165 

Ala Lys Asn Ser Gin Pro Asn Arg 
180 



Phe Thr Leu Ser Glu lie Lys Arg 
125 

Leu Tyr Tyr lie His Arg Asn Lys 
140 

Ala Asn Val Leu lie Thr Arg Asp 
155 160 

Gly Leu Ala Arg Ala Phe Ser Leu 
170 175 

Tyr Thr Asn Arg Val Val Thr Leu 
185 190 



Trp Tyr Arg Pro Pro Glu Leu Leu Leu Gly Glu Arg Asp Tyr Gly Pro 
195 200 205 

Pro lie Asp Leu Trp Gly Ala Gly Cys lie Met Ala Glu Met Trp Thr 
210 215 220 
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Arg Ser Pro lie 
225 

lie Ser Gin Leu 



Asp Asn Tyr Glu 
260 

Arg Lys Val Lys 
275 

Leu Asp Leu lie 
290 

Asp Ser Asp Asp 
305 

Pro Ser Asp Leu 



Glu Tyr Leu Ala 
340 

Ser Thr Asn Gin 
355 

Glu Arg Val Phe 
370 



Met Gin Gly Asn 
230 

Cys Gly Ser lie 
245 

Leu Tyr Glu Lys 



Asp Arg Leu Lys 
280 

Asp Lys Leu Leu 
295 

Ala Leu Asn His 
310 

Lys Gly Met Leu 
325 

Pro Pro Arg Arg 



Ser Arg Asn Pro 
360 



Thr Glu Gin His 
235 

Thr Pro Glu Val 
250 

Leu Glu Leu Val 
265 

Ala Tyr Val Arg 



Val Leu Asp Pro 
300 

Asp Phe Phe Trp 
315 

Ser Thr His Leu 
330 

Lys Gly Ser Gin 
345 

Ala Thr Thr Asn 



Gin Leu Ala Leu 
240 

Trp Pro Asn Val 
255 

Lys Gly Gin Lys 
270 

Asp Pro Tyr Ala 
285 

Ala Gin Arg lie 



Ser Asp Pro Met 
320 

Thr Ser Met Phe 
335 

He Thr Gin Gin 
350 

Gin Thr Glu Phe 
365 



(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ACGAATTCCA CACAATCCAA AGATC 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CAGAATTCCT ATTGCCGATC CCCAGA 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY: modif ied_base 

(B) LOCATION: one-of(8, 14) 

(D) OTHER INFORMATION: /modJbase= OTHER 
/note= "N = A or C or G or T" 

(ix) FEATURE: 

(A) NAME /KEY: modif iedjoase 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /modjoase= OTHER 
/note= n Y - C or T" 

(ix) FEATURE: 

(A) NAME/KEY: modif iedjoase 

(B) LOCATION: one-of (17, 20) 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "R = A or G" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GGAATTCNAT GYTNCARCAR CC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY: modif ied_base 

(B) LOCATION: one-of (13, 16, 19, 22, 25) 
(D) OTHER INFORMATION: /mod_base= OTHER 

/note= "R = A or G" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
AACTGCAGTC CARAARAART CRTGRTT 
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(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
TGTCAAGGAT CAAACCGGCT GTGAT 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
CGAATTCCAA GAAACGCATC GATGC 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
AGACCTGCCA AATCGTGT 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
AGAAGGTGGA TCTGTAACCA TTCGT 
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(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
GGAATTCAGA TCTCGATCAG ATTCA 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
TTACTACTCG AGCTACCAAA CCCGGTC 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
TAAGCAAGCT TCTATGGCGC ACATGTCC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
TTACTACTCG AGCTACCAAA CCCGGTC 
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(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : modif iedjoase 

(B) LOCATION: one-of(13, 16, 22) 

(D) OTHER INFORMATION: /modjoase= OTHER 
/note= "Y = C or T" 

(ix) FEATURE: 

(A) NAME /KEY: modif ied_base 

(B) LOCATION: 17 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= »W = A or T" 

(ix) FEATURE : 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 18 

(D) OTHER INFORMATION: /mod_base= OTHER 
/notes "S = C or G" 

(ix) FEATURE : 

(A) NAME/KEY: modif iedjoase 

(B) LOCATION: 19 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "N = A or C or G or T" 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 19: 

GGAATTCTGG TAYTTYWSNA AYGA 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRAHDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: modif iedjoase 

(B) LOCATION: 11 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "Y = C or T" 

(ix) FEATURE: 

(A) NAME /KEY: modif iedjoase 

(B) LOCATION: 14 
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(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "R = A or G" 

(ix) FEATURE: 

(A) NAME /KEY : modif ied_base 

(B) LOCATION: one-of(17, 20) 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "N = A or C or G or T" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CGGGATCCTG YTCRAANGGN GGCAT 25 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : modif ied^base 

(B) LOCATION: one-of(ll, 14, 20) 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "N = A or C or G or T" 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 23 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= »'R = A or G" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

CGGGATCCAA NGGNGGCATN CCRT 24 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

ATCACGACAC CACCAGAGCT GTTA 24 
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(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
CGAATTCAGA TCGTGAACGG GA 



(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CGAATTCAGG CGCTAGCAAT G 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
GAAAGGCGTA GAACCGA 



(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26 
GCTGACCCAT TTCCTGTATC AGATAG 
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(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GGAATTCTTC TGCTTGGCGA AT 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
GGGAATTCGA GGTTCTATAC ATAT 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
CTGTGTGAAT GGAATCTGTG ATGTG 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
TATCCCGGGT CATATGAGTC TCCTAGCC 
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(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Met Leu Gin Gin Pro Ser Gly Ser Thr Pro Ser Asn Val 
15 10 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Ala Asp Thr Ala Leu Asn His Asp Phe Phe Trp Thr Asp Pro Met Pro 
15 10 15 

Ser 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

Met Leu Gin Gin Pro 
1 5 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Asn His Asp Phe Phe Trp Thr 
1 5 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 

Ser Pro Glu Trp Pro Asp lie 
1 5 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

Trp Tyr Phe Ser Asn Asp Gin Leu Ala Asn Ser Pro Ser Arg 
15 10 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Thr Val His Gly Met Pro Pro Phe Glu Gin Gin Leu Pro Tyr 
15 10 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 
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(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 

Trp Tyr Phe Ser Asn Asp 
1 5 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 

Met Pro Pro Phe Glu Gin 
1 5 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 

His Gly Met Pro Pro Phe 
1 5 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
GCAGGATCCA GAATTCCATA TGGCAAAGCA GTACGACTCG 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

CAGTACTCGA GTTATCAGAA GACGCGCTCA AAC 33 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 452 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

GGGGGGGGGG GGGTGAATGA AGGAGCGGGC GGAGGAGGAA TTGTCATGGC GTCGGGCCGT 60 

GGAGCTTCTT CTCGCTGGTT CTTTACTCGG GAACAGCTGG AGAACACGCC GAGCCGCCGC 120 

TGCGGAGTGG AGGCGGATAA AGAGCTCTCG TGCCGCCAGC AGGCGGCCAA CCTCATCCAG 180 

GAGATGGGAC AGCGTCTCAA TGTCTCTCAG CTTACAATAA ACACTGCGAT TGTTTATATG 240 

CACAGGTTTT ATATGCACCA TTCTTTCACC AAATTCAACA AAAATATAAT ATCGTCTACT 3 00 

GCATTATTTT TGGCTGCAAA AGTGGAAGAA CAGGCTCGAA AACTTGAACA TGTTATCAAA 360 

GTAGCACATG CTTGTCTTCA TCCTCTAGAG CCACTGCTGG ATACTAAATG TGATGCTTAC 42 0 

CTTCAACAGA CTCAAGAACT GGTTATACTT GAAACCATAA TGCTACAAAC TCTAGGTTTT 480 

GAGATCACCA TTGAACACCC ACACACAGAT GTGGTGAAAT GTACCCAGTT AGTAAGAGCA 540 

AGCAAGGATT TGGCACAGAC ATCCTATTTC ATGGCTACCA ACAGTCTGCA TCTTACAACC 600 

TTCTGTCTTC AGTACAAACC AACAGTGATA GCATGTGTAT GCATTCATTT GGCTTGCAAA 660 

TGGTCCAATT GGGAGATCCC TGTATCAACT GATGGAAAGC ATTGGTGGGA ATATGTGGAT 72 0 

CCTACAGTTA CTCTAGAATT ATTAGATGAG CTAACACATG AGTTTCTACA AATATTGGAG 780 

AAAACGCCTA ATAGGTTGAA GAAGATTCGA AACTGGAGGG CTAATCAGGC AGCTAGGAAA 840 

CCAAAAGTAG ATGGACAGGT ATCAGAGACA CCACTTCTTG GTTCATCTTT GGTCCAGAAT 900 

TCCATTTTAG TAGATAGTGT CACTGGTGTG CCTACAAACC CAAGTTTTCA GAAACCATCT 960 

ACATCAGCAT TCCCTGCGCC AGTACCTCTA AATTCAGGAA ATATTTCTGT TCAAGACAGC 102 0 

CATACATCTG ATAATTTGTC AATGCTAGCA ACAGGAATGC CAAGTACTTC ATACGGTTTA 1080 



A: 123 70 1 (2NG50II.DOC) 



-257- 



TCATCACACC AGGAATGGCC TCAACATCAA GACTCAGCAA GGACAGAACA GCTATATTCA 1140 

CAGAAACAGG AGACATCTTT GTCTGGTAGC CAGTACAACA TCAACTTCCA GCAGGGACCT 1200 

TCTATATCAC TGCATTCAGG ATTACATCAC AGACCTGACA AAATTTCAGA TCATTCTTCT 1260 

GTTAAGCAAG AATATACTCA TAAAGCAGGG AGCAGTAAAC ACCATGGGCC AATTTCCACT 1320 

ACTCCAGGAA TAATTCCTCA GAAAATGTCT TTAGATAAAT ATAGAGAAAA GCGTAAACTA 1380 

GAAACTCTTG ATCTCGATGT AAGGGATCAT TATATAGCTG CCCAGGTAGA ACAGCAGCAC 1440 

AAACAAGGGC AGTCACAGGC AGCCAGCAGC AGTTCTGTTA CTTCTCCCAT TAAAATGAAA 1500 

ATACCTATCG CAAATACTGA AAAATACATG GCAGATAAAA AGGAAAAGAG TGGGTCACTG 1560 

AAATTACGGA TTCCAATACC ACCCACTGAT AAAAGCGCCA GTAAAGAAGA ACTGAAAATG 1620 

AAAATAAAAG TTTCTTCTTC AGAAAGACAC AGCTCTTCTG ATGAAGGCAG TGGGAAAAGC 168 0 

AAACATTCAA GCCCACATAT TAGCAGAGAC CATAAGGAGA AGCACAAGGA GCATCCTTCA 1740 

AGCCGCCACC ACACCAGCAG CCACAAGCAT TCCCACTCGC ATAGTGGCAG CAGCAGCGGT 1800 

GGCAGTAAAC ACAGTGCCGA CGGAATACCA CCCACTGTTC TGAGGAGTCC TGTTGGCCTG 1860 

AGCAGTGATG GCATTTCCTC TAGCTCCAGC TCTTCAAGGA AGAGGCTGCA TGTCAATGAT 192 0 

GCATCTCACA ACCACCACTC CAAAATGAGC AAAAGTTCCA AAAGTTCAGG TGGGCTACGG 1980 

ACATCTCAGC ACCTCGTGAA ACTGGACAAG AAGCCAGTGG AGACCAACGG TCCTGATGCC 2040 

AATCACGAGT ACAGTACAAG CAGCCAGCAT ATGGACTACA AAGACACATT CGACATGCTG 2100 

GACTCACTGT TAAGTGCCCA AGGAATGAAC ATGTAATAAT TTGTTTAGGT CAATTTTTCC 2160 

TTTACTTTTT TAATTTAAAA ATTGTTAGAA TGGAAAAATT CCTTCTGATC TAGCAGTGGT 2220 

AACCCCTGCT GTTGCTGCCA CTGCTTCAAT ATTTGTAAGT GCTACTTTAT TCTTCATTCT 2280 

GAAAAGAAGA GATTATAGTA AACAAGTCTT TATCTCCACA TATGATAGTG TTATAAATAC 2340 

TGTAAAGGCA TGGAAGGTGC AAAACTCAGT ATTTCTACAA TTGCAGCTAA GAACATTAGG 2400 

ATGAATGGCT GGCTGCTTCT AGGAATATAA GATGCCTCAA GCATTCATTA TTTATGATTT 2460 

GAATACTGTA GCTATTTTTT GTTGCTTGGC TTTTGAATGA GTGTAAATTG TTTTCTTTTG 2520 

TGTATTTATA CTTGTATGTA TGATTTGCAT GTTTCAATGA TAAAGGGATA AAACAGTATA 2580 

CTGACAACTG TTTACAAGAA AGTGGAGAAA ATGTACTACA TTTTGTATGT TTAGATATTA 2640 

CCGTAAATAC TCAGGATTGG AGCTGCTTGT AAGTATAACA ATATACAGAA TACTTTATTT 2700 
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TATCTTGTCA GAGTTCCATC ACTATCTAAA ACAAAGGTGC AATTTTTTAT GTTAACCTTA 2760 

AATCTAGCCC TTACTGGAAG CCACTGATAG GGACATTCAC TACCAGATGT GTGCAGTGCA 2820 

GCAGATGGTC ATATAACACT GTGAGGCACT GAATTTTGCC TTCAGAGGTT CTGACCAGAT 288 0 

TGGCTGCTGA AATAGCCCCT AACTTTCTGA AGGCTTGAAG AGGAAAAAAT AAAGTTTACA 2940 

TACTCTTGAT GGAAGTGCAT TTAAATGTTT GTTGGCTTGT TGCAGTTCTA TGAAACAGAG 3 000 

CTGTTAATAA TGGTTATGTG GATTACTGTG ATTTGAAAAC TAAATTCACA ATAACTTACC 3060 

TAGTAGAGAT TTAGTGAGTT GTTTCCTTTA AAGAATTTTA CACTACATAT TTTAATAGTA 3120 

AACAGGGTCA CTTTCCTTTA GCATTCAGAA TGACACCATA TTCTTAAATA TACTCCTTCC 3180 

CTGAAGCGTG TTTGTGTGTG ATGCCATATT TCTTTTTCAG GTAAATGTAG TCTTCCTTAT 3240 

AAAAATGAAA TTAAAC CTAT GCTCTCAATT CTTTTATATT CTAACAATAA ATAAAAAAGA 33 00 

AAAGATTACT GACTGTGCAT TGTACCTGTA TTTATAGTTT ATGGTTATCA GAAGCTCTGT 3360 

AAGAAAGAAA AGGTCAGCTC CCAGGCAAAC CAGTAGTGGA GGTTTTACAT TTGTTTGCAC 3420 

ATCTCAGTAT ATTTCTGTTG AGGTAAAGTT TGCACAGTCA TCTGACTTCT GATCAAGCAT 3480 

TAGATTTTAA CTTGTTTAGA TTTTGTCTTA AACACCAGTA ATATGGCTCT TGTTTATCAG 3540 

CTAATCTTGA ATTTATTCTG TGGTAAATCT TTTGAGTTGC TGAGTATATT TGAGATTGAT 3600 

TGGATTCAAC CTCTTGTTGA ACTGAAAACT TAATTTTTTC TCTGTATTTT TGTTACAAAG 3660 

CCACTGATAC GTGCACAATT GTAATTAAGT ATGTTGCAGT TGTAAATATT AGAGTTTAAT 372 0 

CTCATGCTCT ACCTTTATTT AGCAATTACC TAATTTGCCA GTAGCTTTAT AATTTTTAAA 3780 

GATAATTGTT CATTATTTTG TCAATGTTAT TTGAACTTGG GGTACTTAGG AGCCTCTTTG 3840 

TAGGGACTGT GCCTAGGTAG CATGTCCTAA CATTTGTTCT GGTCTTGCAT AACTTCAGTA 3900 

TCTTTGTCAT TATATGTAAC TTTGTTGCTC TGTATGGCAT AATATTGTAT CCATAAACAT 3960 

GGTAATTTTG ATACAGTTAT ACTTTTACAG TGGTACATAA TCCAAGGACT AGTATAGAAT 4020 

TAAGCTGAGT GCAAGATGAG GGAGGGAAGG GCTTTCTTGG TAATTTAGAT GTGAAACCTC 4080 

TACAGAGCTA TCATGTAAAA ACTACATGAG GTGGTTGTGC TACTGTATAA TTGGGGGTGA 4140 

TAATACCAGG AATTTTAATA AGATTTTGTA AAGAATATCC AGAAAAGTAG TGAACTTATT 42 00 

TTCAGTAGGC ATAGAAAACA ATGTGAATAT TTAAGGTCTG TGACTATAGT TAAACTTCAC 4260 

TAAGAATTTG CAGAATTGTT TTGAGATGTG TGAATAAAGG TAATTTTATT GAATCTTCAT 432 0 
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TGGTGCTAAT GTTGGACAGT TAAAAAGATA GCTAGTGTAT ATTGTTATGG GTCAGTACTT 4380 

ATTAGTACTT CCAAAATTGA ATTTGAAATG CTATGTATTC ACTTTTCACT CTGTAAATGT 4440 

AATTCTTTAC AATGACTTTA TTTATTAAAG GGCAGCCAGT TGTCATTTGT AAAAAAAAAA 4500 

AAAAAAAAAA AAAGCGGCCG CTGAATTC 4528 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2091 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

ATGGCGTCGG GCCGTGGAGC TTCTTCTCGC TGGTTCTTTA CTCGGGAACA GCTGGAGAAC 60 

ACGCCGAGCC GCCGCTGCGG AGTGGAGGCG GATAAAGAGC TCTCGTGCCG CCAGCAGGCG 120 

GCCAACCTCA TCCAGGAGAT GGGACAGCGT CTCAATGTCT CTCAGCTTAC AATAAACACT 180 

GCGATTGTTT ATATGCACAG GTTTTATATG CACCATTCTT TCACCAAATT CAACAAAAAT 24 0 

ATAATATCGT CTACTGCATT ATTTTTGGCT GCAAAAGTGG AAGAACAGGC TCGAAAACTT 300 

GAACATGTTA TCAAAGTAGC ACATGCTTGT CTTCATCCTC TAGAGCCACT GCTGGATACT 360 

AAATGTGATG CTTACCTTCA ACAGACTCAA GAACTGGTTA TACTTGAAAC CATAATGCTA 420 

CAAACTCTAG GTTTTGAGAT CACCATTGAA CACCCACACA CAGATGTGGT GAAATGTACC 480 

CAGTTAGTAA GAGCAAGCAA GGATTTGGCA CAGACATCCT ATTTCATGGC TACCAACAGT 540 

CTGCATCTTA CAACCTTCTG TCTTCAGTAC AAACCAACAG TGATAGCATG TGTATGCATT 600 

CATTTGGCTT GCAAATGGTC CAATTGGGAG ATCCCTGTAT CAACTGATGG AAAGCATTGG 660 

TGGGAATATG TGGATCCTAC AGTTACTCTA GAATTATTAG ATGAGCTAAC ACATGAGTTT 720 

CTACAAATAT TGGAGAAAAC GCCTAATAGG TTGAAGAAGA TTCGAAACTG GAGGGCTAAT 780 

CAGGCAGCTA GGAAAC C AAA AGTAGATGGA CAGGTATCAG AGACACCACT TCTTGGTTCA 840 

TCTTTGGTCC AGAATTCCAT TTTAGTAGAT AGTGTCACTG GTGTGCCTAC AAACCCAAGT 900 

TTTCAGAAAC CATCTACATC AGCATTCCCT GCGCCAGTAC CTCTAAATTC AGGAAATATT 960 

TCTGTTCAAG ACAGCCATAC ATCTGATAAT TTGTCAATGC TAGCAACAGG AATGC CAAGT 1020 

ACTTCATACG GTTTATCATC ACACCAGGAA TGGCCTCAAC ATCAAGACTC AGCAAGGACA 1080 
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GAACAGCTAT ATTCACAGAA ACAGGAGACA TCTTTGTCTG GTAGCCAGTA CAACATCAAC 1140 

TTCCAGCAGG GACCTTCTAT ATCACTGCAT TCAGGATTAC ATCACAGACC TGACAAAATT 1200 

TCAGATCATT CTTCTGTTAA GCAAGAATAT ACTCATAAAG CAGGGAGCAG TAAACACCAT 1260 

GGGCCAATTT CCACTACTCC AGGAATAATT CCTCAGAAAA TGTCTTTAGA TAAATATAGA 132 0 

GAAAAGCGTA AACTAGAAAC TCTTGATCTC GATGTAAGGG ATCATTATAT AGCTGCCCAG 138 0 

GTAGAACAGC AGCACAAACA AGGGCAGTCA CAGGCAGCCA GCAGCAGTTC TGTTACTTCT 144 0 

CCCATTAAAA TGAAAATACC TATCGCAAAT ACTGAAAAAT ACATGGCAGA TAAAAAGGAA 1500 

AAGAGTGGGT CACTGAAATT ACGGATTCCA ATACCACCCA CTGATAAAAG CGCCAGTAAA 156 0 

GAAGAACTGA AAATGAAAAT AAAAGTTTCT TCTTCAGAAA GACACAGCTC TTCTGATGAA 1620 

GGCAGTGGGA AAAGCAAACA TTCAAGCCCA CATATTAGCA GAGACCATAA GGAGAAGCAC 1680 

AAGGAGCATC CTTCAAGCCG CCACCACACC AGCAGCCACA AGCATTCCCA CTCGCATAGT 1740 

GGCAGCAGCA GCGGTGGCAG TAAACACAGT GCCGACGGAA TACCACCCAC TGTTCTGAGG 1800 

AGTCCTGTTG GCCTGAGCAG TGATGGCATT TCCTCTAGCT CCAGCTCTTC AAGGAAGAGG 1860 

CTGCATGTCA ATGATGCATC TCACAACCAC CACTCCAAAA TGAGCAAAAG TTCCAAAAGT 1920 

TCAGGTGGGC TACGGACATC TCAGCACCTC GTGAAACTGG ACAAGAAGCC AGTGGAGACC 1980 

AACGGTCCTG ATGCCAATCA CGAGTACAGT ACAAGCAGCC AGCATATGGA CTACAAAGAC 2040 

ACATTCGACA TGCTGGACTC ACTGTTAAGT GCCCAAGGAA TGAACATGTA A 2091 

(2) INFORMATION FOR SEQ ID NO: 45; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 696 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 

Met Ala Ser Gly Arg Gly Ala Ser Ser Arg Trp Phe Phe Thr Arg Glu 
15 10 15 

Gin Leu Glu Asn Thr Pro Ser Arg Arg Cys Gly Val Glu Ala Asp Lys 
20 25 30 

Glu Leu Ser Cys Arg Gin Gin Ala Ala Asn Leu lie Gin Glu Met Gly 
35 40 45 
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Gin Arg Leu Asn Val Ser Gin Leu Thr lie Asn Thr Ala lie Val Tyr 
50 55 60 



Met His Arg Phe Tyr Met His His Ser Phe Thr Lys Phe Asn Lys Asn 
65 70 75 80 

lie lie Ser Ser Thr Ala Leu Phe Leu Ala Ala Lys Val Glu Glu Gin 
85 90 95 

Ala Arg Lys Leu Glu His Val lie Lys Val Ala His Ala Cys Leu His 
100 105 110 

Pro Leu Glu Pro Leu Leu Asp Thr Lys Cys Asp Ala Tyr Leu Gin Gin 
115 120 125 

Thr Gin Glu Leu Val lie Leu Glu Thr lie Met Leu Gin Thr Leu Gly 
130 135 140 

Phe Glu He Thr He Glu His Pro His Thr Asp Val Val Lys Cys Thr 
145 150 155 160 

Gin Leu Val Arg Ala Ser Lys Asp Leu Ala Gin Thr Ser Tyr Phe Met 
165 170 175 

Ala Thr Asn Ser Leu His Leu Thr Thr Phe Cys Leu Gin Tyr Lys Pro 
180 185 190 

Thr Val He Ala Cys Val Cys He His Leu Ala Cys Lys Trp Ser Asn 
195 200 205 

Trp Glu He Pro Val Ser Thr Asp Gly Lys His Trp Trp Glu Tyr Val 
210 215 220 

Asp Pro Thr Val Thr Leu Glu Leu Leu Asp Glu Leu Thr His Glu Phe 
225 230 235 240 

Leu Gin He Leu Glu Lys Thr Pro Asn Arg Leu Lys Lys He Arg Asn 
245 250 255 

Trp Arg Ala Asn Gin Ala Ala Arg Lys Pro Lys Val Asp Gly Gin Val 
260 265 270 

Ser Glu Thr Pro Leu Leu Gly Ser Ser Leu Val Gin Asn Ser He Leu 
275 280 285 

Val Asp Ser Val Thr Gly Val Pro Thr Asn Pro Ser Phe Gin Lys Pro 
290 295 300 

Ser Thr Ser Ala Phe Pro Ala Pro Val Pro Leu Asn Ser Gly Asn He 
305 310 315 320 

Ser Val Gin Asp Ser His Thr Ser Asp Asn Leu Ser Met Leu Ala Thr 
325 330 335 
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Gly Met Pro Ser Thr Ser Tyr Gly Leu Ser Ser His Gin Glu Trp Pro 
340 345 350 



Gin His Gin Asp Ser Ala Arg Thr Glu Gin Leu Tyr Ser Gin Lys Gin 
355 360 365 

Glu Thr Ser Leu Ser Gly Ser Gin Tyr Asn lie Asn Phe Gin Gin Gly 
370 375 380 

Pro Ser lie Ser Leu His Ser Gly Leu His His Arg Pro Asp Lys lie 
385 390 395 400 

Ser Asp His Ser Ser Val Lys Gin Glu Tyr Thr His Lys Ala Gly Ser 
405 410 415 

Ser Lys His His Gly Pro lie Ser Thr Thr Pro Gly lie lie Pro Gin 
420 425 430 

Lys Met Ser Leu Asp Lys Tyr Arg Glu Lys Arg Lys Leu Glu Thr Leu 
435 440 445 

Asp Leu Asp Val Arg Asp His Tyr He Ala Ala Gin Val Glu Gin Gin 
450 455 460 

His Lys Gin Gly Gin Ser Gin Ala Ala Ser Ser Ser Ser Val Thr Ser 
465 470 475 480 

Pro He Lys Met Lys He Pro He Ala Asn Thr Glu Lys Tyr Met Ala 
485 490 495 

Asp Lys Lys Glu Lys Ser Gly Ser Leu Lys Leu Arg He Pro He Pro 
500 505 510 

Pro Thr Asp Lys Ser Ala Ser Lys Glu Glu Leu Lys Met Lys He Lys 
515 520 525 

Val Ser Ser Ser Glu Arg His Ser Ser Ser Asp Glu Gly Ser Gly Lys 
530 535 540 

Ser Lys His Ser Ser Pro His He Ser Arg Asp His Lys Glu Lys His 
545 550 555 560 

Lys Glu His Pro Ser Ser Arg His His Thr Ser Ser His Lys His Ser 
565 570 575 

His Ser His Ser Gly Ser Ser Ser Gly Gly Ser Lys His Ser Ala Asp 
580 585 590 

Gly He Pro Pro Thr Val Leu Arg Ser Pro Val Gly Leu Ser Ser Asp 
595 600 605 

Gly He Ser Ser Ser Ser Ser Ser Ser Arg Lys Arg Leu His Val Asn 
610 615 620 
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Asp Ala Ser His Asn His His Ser Lys Met Ser Lys Ser Ser Lys Ser 
625 630 635 640 



Ser Gly Gly Leu Arg Thr Ser Gin His Leu Val Lys Leu Asp Lys Lys 
645 650 655 

Pro Val Glu Thr Asn Gly Pro Asp Ala Asn His Glu Tyr Ser Thr Ser 
660 665 670 

Ser Gin His Met Asp Tyr Lys Asp Thr Phe Asp Met Leu Asp Ser Leu 
675 680 685 

Leu Ser Ala Gin Gly Met Asn Met 
690 695 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2190 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 
ATGGCGTCGG GCCGTGGAGC TTCTTCTCGC TGGTTCTTTA CTCGGGAACA 
ACGCCGAGCC GCCGCTGCGG AGTGGAGGCG GATAAAGAGC TCTCGTGCCG 
GCCAACCTCA TCCAGGAGAT GGGACAGCGT CTCAATGTCT CTCAGCTTAC 
GCGATTGTTT ATATGCACAG GTTTTATATG CACCATTCTT TCACCAAATT 
ATAATATCGT CTACTGCATT ATTTTTGGCT GCAAAAGTGG AAGAACAGGC 
GAACATGTTA TCAAAGTAGC ACATGCTTGT CTTCATCCTC TAGAGC C ACT 
AAATGTGATG CTTACCTTCA ACAGACTCAA GAACTGGTTA TACTTGAAAC 
CAAACTCTAG GTTTTGAGAT C AC CATTGAA CACCCACACA CAGATGTGGT 
CAGTTAGTAA GAGCAAGCAA GGATTTGGCA CAGACATCCT ATTTCATGGC 
CTGCATCTTA CAACCTTCTG TCTTCAGTAC AAACCAACAG TGATAGCATG 
CATTTGGCTT GCAAATGGTG CAATTGGGAG ATCCCTGTAT CAACTGATGG 
TGGGAATATG TGGATCCTAC AGTTACTCTA GAATTATTAG ATGAGCTAAC 
CTACAAATAT TGGAGAAAAC GCCTAATAGG TTGAAGAAGA TTCGAAACTG 
CAGGCAGCTA GGAAACCAAA AGTAGATGGA CAGGTATCAG AGACACCACT 



GCTGGAGAAC 60 

CCAGCAGGCG 12 0 

AATAAACACT 180 

CAACAAAAAT 240 

TCGAAAACTT 3 00 

GCTGGATACT 360 

CATAATGCTA 420 

GAAATGTACC 480 

TACCAACAGT 540 

TGTATGCATT 600 

AAAGCATTGG 660 

ACATGAGTTT 72 0 

GAGGGCTAAT 780 

TCTTGGTTCA 840 
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TCTTTGGTCC AGAATTCCAT TTTAGTAGAT AGTGTCACTG GTGTGCCTAC AAACCCAAGT 900 

TTTCAGAAAC CATCTACATC AGCATTCCCT GCGCCAGTAC CTCTAAATTC AGGAAATATT 960 

TCTGTTCAAG ACAGCCATAC ATCTGATAAT TTGTCAATGC TAGCAACAGG AATGCCAAGT 1020 

ACTTCATACG GTTTATCATC AC AC C AGGAA TGGCCTCAAC ATCAAGACTC AGCAAGGACA 1080 

GAACAGCTAT ATTCACAGAA ACAGGAGACA TCTTTGTCTG GTAGCCAGTA CAACATCAAC 1140 

TTCCAGCAGG GACCTTCTAT ATCACTGCAT TCAGGATTAC ATCACAGACC TGACAAAATT 1200 

TCAGATCATT CTTCTGTTAA GCAGGAATAT ACTCATAAAG CAGGGAGCAG TAAACACCAT 1260 

GGGCCAATTT CCACTACTCC AGGAATAATT CCTCAGAAAA TGTCTTTAGA TAAATATAGA 132 0 

GAAAAGCGTA AACTAGAAAC TCTTGATCTC GATGTAAGGG ATCATTATAT AGCTGCCCAG 1380 

GTAGAACAGC AGCACAAACA AGGGCAGTCA CAGGCAGCCA GCAGCAGTTC TGTTACTTCT 1440 

CCCATTAAAA TGAAAATACC TATCGCAAAT ACTGAAAAAT ACATGGCAGA TAAAAAGGAA 1500 

AAGAGTGGGT CACTGAAATT ACGGATTCCA ATACCACCCA CTGATAAAAG CGCCAGTAAA 1560 

GAAGAACTGA AAATGAAAAT AAAAGTTTCT TCTTCAGAAA GACACAGCTC TTCTGATGAA 1620 

GGCAGTGGGA AAAGCAAACA TTCAAGCCCA CATATTAGCA GAG AC C AT AA GGAGAAGCAC 1680 

AAGGAGCATC CTTCAAGCCG CCACCACACC AGCAGCCACA AGCATTCCCA CTCGCATAGT 1740 

GGCAGCAGCA GCGGTGGCAG TAAACACAGT GCCGACGGAA TACCACCCAC TGTTCTGAGG 1800 

AGTCCTGTTG GCCTGAGCAG TGATGGCATT TCCTCTAGCT CCAGCTCTTC AAGGAAGAGG 1860 

CTGCATGTCA ATGATGCATC TCACAACCAC CACTCCAAAA TGAGCAAAAG TTCCAAAAGT 192 0 

TCAGGTAGTT CATCTAGTTC TTCCTCCTCT GTTAAGCAGT ATATATC CTC TCACAACTCT 1980 

GTTTTTAACC ATCCCTTACC CCTCCTCCCC TGTCACATAC CAGGTGGGCT ACGGACATCT 2040 

CTGCACCTCG TGAAACTGGA CAAGAAGCCA GTGGAGAC C A ACGGTCCTGA TGCCAATCAC 2100 

GAGTACAGTA CAAGCAGCCA GCATATGGAC TACAAAGACA CATTCGACAT GCTGGACTCA 2160 

CTGTTAAGTG CCCAAGGAAT GAACATGTAA 2190 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 729 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

Met Ala Ser Gly Arg Gly Ala Ser Ser Arg Trp Phe Phe Thr Arg Glu 
15 10 15 

Gin Leu Glu Asn Thr Pro Ser Arg Arg Cys Gly Val Glu Ala Asp Lys 
20 25 30 

Glu Leu Ser Cys Arg Gin Gin Ala Ala Asn Leu He Gin Glu Met Gly 
35 40 45 

Gin Arg Leu Asn Val Ser Gin Leu Thr He Asn Thr Ala He Val Tyr 
50 55 60 

Met His Arg Phe Tyr Met His His Ser Phe Thr Lys Phe Asn Lys Asn 
65 70 75 80 

He He Ser Ser Thr Ala Leu Phe Leu Ala Ala Lys Val Glu Glu Gin 
85 90 95 

Ala Arg Lys Leu Glu His Val He Lys Val Ala His Ala Cys Leu His 
100 105 110 

Pro Leu Glu Pro Leu Leu Asp Thr Lys Cys Asp Ala Tyr Leu Gin Gin 
115 120 125 

Thr Gin Glu Leu Val He Leu Glu Thr He Met Leu Gin Thr Leu Gly 
130 135 140 

Phe Glu He Thr He Glu His Pro His Thr Asp Val Val Lys Cys Thr 
145 150 155 160 

Gin Leu Val Arg Ala Ser Lys Asp Leu Ala Gin Thr Ser Tyr Phe Met 
165 170 175 

Ala Thr Asn Ser Leu His Leu Thr Thr Phe Cys Leu Gin Tyr Lys Pro 
180 185 190 

Thr Val He Ala Cys Val Cys He His Leu Ala Cys Lys Trp Ser Asn 
195 200 205 

Trp Glu He Pro Val Ser Thr Asp Gly Lys His Trp Trp Glu Tyr Val 
210 215 220 

Asp Pro Thr Val Thr Leu Glu Leu Leu Asp Glu Leu Thr His Glu Phe 
225 230 235 240 

Leu Gin He Leu Glu Lys Thr Pro Asn Arg Leu Lys Lys He Arg Asn 
245 250 255 

Trp Arg Ala Asn Gin Ala Ala Arg Lys Pro Lys Val Asp Gly Gin Val 
260 265 270 
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Ser Glu Thr Pro Leu Leu Gly Ser Ser Leu Val Gin Asn Ser lie Leu 
275 280 285 



Val Asp Ser Val Thr Gly Val Pro Thr Asn Pro Ser Phe Gin Lys Pro 
290 295 300 

Ser Thr Ser Ala Phe Pro Ala Pro Val Pro Leu Asn Ser Gly Asn lie 
305 310 315 320 

Ser Val Gin Asp Ser His Thr Ser Asp Asn Leu Ser Met Leu Ala Thr 
325 330 335 

Gly Met Pro Ser Thr Ser Tyr Gly Leu Ser Ser His Gin Glu Trp Pro 
340 345 350 

Gin His Gin Asp Ser Ala Arg Thr Glu Gin Leu Tyr Ser Gin Lys Gin 
355 360 365 

Glu Thr Ser Leu Ser Gly Ser Gin Tyr Asn He Asn Phe Gin Gin Gly 
370 375 380 

Pro Ser He Ser Leu His Ser Gly Leu His His Arg Pro Asp Lys He 
385 390 395 400 

Ser Asp His Ser Ser Val Lys Gin Glu Tyr Thr His Lys Ala Gly Ser 
405 410 415 

Ser Lys His His Gly Pro He Ser Thr Thr Pro Gly He He Pro Gin 
420 425 430 

Lys Met Ser Leu Asp Lys Tyr Arg Glu Lys Arg Lys Leu Glu Thr Leu 
435 440 445 

Asp Leu Asp Val Arg Asp His Tyr He Ala Ala Gin Val Glu Gin Gin 
450 455 460 

His Lys Gin Gly Gin Ser Gin Ala Ala Ser Ser Ser Ser Val Thr Ser 
465 470 475 480 

Pro He Lys Met Lys He Pro He Ala Asn Thr Glu Lys Tyr Met Ala 
485 490 495 

Asp Lys Lys Glu Lys Ser Gly Ser Leu Lys Leu Arg He Pro He Pro 
500 505 510 

Pro Thr Asp Lys Ser Ala Ser Lys Glu Glu Leu Lys Met Lys He Lys 
515 520 525 

Val Ser Ser Ser Glu Arg His Ser Ser Ser Asp Glu Gly Ser Gly Lys 
530 535 540 

Ser Lys His Ser Ser Pro His He Ser Arg Asp His Lys Glu Lys His 
545 550 555 560 



A; 12370] (2NG501I.DOC) 



-267- 



Lys Glu His Pro 



His Ser His Ser 
580 

Gly lie Pro Pro 
595 

Gly lie Ser Ser 
610 

Asp Ala Ser His 
625 

Ser Gly Ser Ser 



Ser His Asn Ser 
660 

He Pro Gly Gly 
675 

Lys Pro Val Glu 
690 

Ser Ser Gin His 
705 



Ser Ser Arg His 
565 

Gly Ser Ser Ser 



Thr Val Leu Arg 
600 

Ser Ser Ser Ser 
615 

Asn His His Ser 
630 

Ser Ser Ser Ser 
645 

Val Phe Asn His 



Leu Arg Thr Ser 
680 

Thr Asn Gly Pro 
695 

Met Asp Tyr Lys 
710 



His Thr Ser Ser 
570 

Gly Gly Ser Lys 
585 

Ser Pro Val Gly 



Ser Arg Lys Arg 
620 

Lys Met Ser Lys 
635 

Ser Ser Val Lys 
650 

Pro Leu Pro Leu 
665 

Gin His Leu Val 



Asp Ala Asn His 
700 

Asp Thr Phe Asp 
715 



His Lys His Ser 
575 

His Ser Ala Asp 
590 

Leu Ser Ser Asp 
605 

Leu His Val Asn 



Ser Ser Lys Ser 
640 

Gin Tyr He Ser 
655 

Leu Pro Cys His 
670 

Lys Leu Asp Lys 
685 

Glu Tyr Ser Thr 



Met Leu Asp Ser 
720 



Leu Leu Ser Ala Gin Gly Met Asn Met 
725 



(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2360 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

GGAAGTGCCT GCAACCTTCG CCGCTGCCTT CTGGTTGAAG CACTATGGAG GGAGAGAGGA 60 

AGAACAACAA CAAACGGTGG TATTTCACTC GAGAACAGCT GGAAAATAGC CCATCCCGTC 120 

GTTTTGGCGT GGAC CCAGAT AAAGAACTTT CTTATCGCCA GCAGGCGGCC AATCTGCTTC 180 

AGGACATGGG GCAGCGTCTT AACGTCTCAC AATTGACTAT CAACACTGCT ATAGTATACA 24 0 

TGCATCGATT CTACATGATT CAGTCCTTCA CACGGTTCCC TGGAAATTCT GTGGCTCCAG 300 
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CAGCCTTGTT TCTAGCAGCT AAAGTGGAGG AGCAGCCCAA AAAATTGGAA CATGTCATCA 360 

AGGTAGCACA TACTTGTCTC CATCCTCAGG AATCCCTTCC TGATACTAGA AGTGAGGCTT 420 

ATTTGCAACA AGTTCAAGAT CTGGTCATTT TAGAAAGCAT AATTTTGCAG ACTTTAGGCT 480 

TTGAACTAAC AATTGATCAC CCACATACTC ATGTAGTAAA GTGCACTCAA CTTGTTCGAG 540 

CAAGCAAGGA CTTAGCACAG ACTTCTTACT TCATGGCAAC CAACAGCCTG CATTTGACCA 600 

CATTTAGCCT GCAGTACACA CCTCCTGTGG TGGCCTGTGT CTGCATTCAC CTGGCTTGCA 660 

AGTGGTCCAA TTGGGAGATC CCAGTCTCAA CTGACGGGAA GCACTGGTGG GAGTATGTTG 72 0 

ACGCCACTGT GACCTTGGAA CTTTTAGATG AACTGACACA TGAGTTTCTA CAGATTTTGG 780 

AGAAAACTCC CAACAGGCTC AAACGCATTT GGAATTGGAG GGCATGCGAG GCTGCCAAGA 840 

AAACAAAAGC AGATGACCGA GGAACAGATG AAAAGACTTC AGAGCAGACA ATCCTCAATA 900 

TGATTTCCCA GAGCTCTTCA GACACAACCA TTGCAGGTTT AATGAGCATG TCAACTTCTA 960 

CCACAAGTGC AGTGCCTTCC CTGCCAGTCT CCGAAGAGTC ATCCAGCAAC TTAACCAGTG 1020 

TGGAGATGTT GCCGGGCAAG CGTTGGCTGT CCTCCCAACC TTCTTTCAAA CTAGAACCTA 1080 

CTCAGGGTCA TCGGACTAGT GAGAATTTAG CACTTACAGG AGTTGATCAT TCCTTACCAC 1140 

AGGATGGTTC AAATGCATTT ATTTCCCAGA AGCAGAATAG TAAGAGTGTG CCATCAGCTA 1200 

AAGTGTCACT GAAAGAATAC CGCGCGAAGC ATGCAGAAGA ATTGGCTGCC CAGAAGAGGC 1260 

AACTGGAGAA CATGGAAGCC AATGTGAAGT CACAATATGC ATATGCTGCC CAGAATCTCC 1320 

TTTCTCATCA TGATAGCCAT TCTTCAGTCA TTCTAAAAAT GCCCATAGAG GGTTCAGAAA 1380 

ACCCCGAGCG GCCTTTTCTG GAAAAGGCTG ACAAAACAGC TCTCAAAATG AGAATCCCAG 1440 

TGGCAGGTGG AGATAAAGCT GCGTCTTCAA AACCAGAGGA GATAAAAATG CGCATAAAAG 1500 

TCCATGCTGC AGCTGATAAG CACAATTCTG TAGAGGACAG TGTTACAAAG AGC CGAGAGC 1560 

ACAAAGAAGA GCGCAAGACT CACCCATCTA ATCATCATCA TCATCATAAT CACCACTCAC 1620 

ACAAGCACTC TCATTCCCAA CTTCCAGTTG GTACTGGGAA CAAACGTCCT GGTGATCCAA 1680 

AACATAGTAG CCAGACAAGC AACTTAGCAC ATAAAACCTA TAGCTTGTCT AGTTCTTTTT 1740 

CCTCTTCCAG TTCTACTCGT AAAAGGGGAC CCTCTGAAGA GACTGGAGGG GCTGTGTTTG 1800 

ATCATCCAGC CAAGATTGCC AAGAGTACTA AATCCTCTTC CCTAAATTTC TCCTTCCCTT 1860 

CACTTCCTAC AATGGGTCAG ATGCCTGGGC ATAGCTCAGA CACAAGTGGC CTTTCCTTTT 1920 
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CACAGCCCAG CTGTAAAACT CGTGTCCCTC ATTCGAAACT GGATAAAGGG CCCACTGGGG 1980 

CCAATGGTCA CAACACGACC CAGACAATAG ACTATCAAGA CACTGTGAAT ATGCTTCACT 2040 

CCCTGCTCAG TGCCCAGGGT GTTCAGCCCA CTCAGCCCAC TGCATTTGAA TTTGTTCGTC 2100 

CTTATAGTGA CTATCTGAAT CCTCGGTCTG GTGGAATCTC CTCGAGATCT GGCAATACAG 2160 

ACAAACCCCG GCCACCACCT CTGCCATCAG AACCTCCTCC ACCACTTCCA CCCCTTCCTA 2220 

AGTAAAAAAA GAAAAAGAAG AGGAGAAAAA AACTTCTTTA AAAAAACACA TAATTTTTCT 22 80 

TTTTTTTTTG GGGAAAAAAA AATTTTTTTT AAAATTTTTT CCCCAAGGGA CGGGGGAAAA 2340 

TTTTATTTTT AAAATTTTTT 2360 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2181 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

ATGGAGGGAG AGAGGAAGAA CAACAACAAA CGGTGGTATT TCACTCGAGA ACAGCTGGAA 60 

AATAGCCCAT CCCGTCGTTT TGGCGTGGAC C C AG AT AAAG AACTTTCTTA TCGCCAGCAG 120 

GCGGCCAATC TGCTTCAGGA CATGGGGCAG CGTCTTAACG TCTCACAATT GACTATCAAC 180 

ACTGCTATAG TATACATGCA TCGATTCTAC ATGATTCAGT CCTTCACACG GTTCCCTGGA 240 

AATTCTGTGG CTCCAGCAGC CTTGTTTCTA GCAGCTAAAG TGGAGGAGCA GCCCAAAAAA 300 

TTGGAACATG TCATCAAGGT AGCACATACT TGTCTCCATC CTCAGGAATC CCTTCCTGAT 360 

ACTAGAAGTG AGGCTTATTT GCAACAAGTT CAAGATCTGG TCATTTTAGA AAGCATAATT 42 0 

TTGCAGACTT TAGGCTTTGA ACTAACAATT GATCACCCAC ATACTCATGT AGTAAAGTGC 480 

ACTCAACTTG TTCGAGCAAG CAAGGACTTA GCACAGACTT CTTACTTCAT GGCAACCAAC 540 

AGCCTGCATT TGAC CACATT TAGCCTGCAG TACACACCTC CTGTGGTGGC CTGTGTCTGC 600 

ATTCACCTGG CTTGCAAGTG GTCCAATTGG GAGATC CC AG TCTCAACTGA CGGGAAGCAC 660 

TGGTGGGAGT ATGTTGACGC CACTGTGACC TTGGAACTTT TAGATGAACT GACACATGAG 720 

TTTCTACAGA TTTTGGAGAA AACTCCCAAC AGGCTCAAAC GCATTTGGAA TTGGAGGGCA 78 0 
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TGCGAGGCTG CCAAGAAAAC AAAAGCAGAT GACCGAGGAA CAGATGAAAA GACTTCAGAG 840 

CAGACAATCC TCAATATGAT TTCCCAGAGC TCTTCAGACA CAACCATTGC AGGTTTAATG 900 

AGCATGTCAA CTTCTACCAC AAGTGCAGTG CCTTCCCTGC CAGTCTCCGA AGAGTCATCC 960 

AGCAACTTAA CCAGTGTGGA GATGTTGCCG GGCAAGCGTT GGCTGTCCTC CCAACCTTCT 1020 

TTCAAACTAG AACCTACTCA GGGTCATCGG ACTAGTGAGA ATTTAGCACT TACAGGAGTT 1080 

GATCATTCCT TACCACAGGA TGGTTCAAAT GCATTTATTT CCCAGAAGCA GAATAGTAAG 1140 

AGTGTGCCAT CAGCTAAAGT GTCACTGAAA GAATACCGCG CGAAGCATGC AGAAGAATTG 1200 

GCTGCCCAGA AGAGGCAACT GGAGAACATG GAAGCCAATG TGAAGTCACA ATATGCATAT 1260 

GCTGCCCAGA ATCTCCTTTC TCATCATGAT AGCCATTCTT CAGTCATTCT AAAAATGCCC 132 0 

ATAGAGGGTT CAGAAAACCC CGAGCGGCCT TTTCTGGAAA AGGCTGACAA AACAGCTCTC 1380 

AAAATGAGAA TCCCAGTGGC AGGTGGAGAT AAAGCTGCGT CTTCAAAACC AGAGGAGATA 144 0 

AAAATGCGCA TAAAAGTCCA TGCTGCAGCT GATAAGCACA ATTCTGTAGA GGACAGTGTT 1500 

ACAAAGAGCC GAGAGCACAA AGAAGAGCGC AAGACTCACC CATCTAATCA TCATCATCAT 1560 

CATAATCACC ACTCACACAA GCACTCTCAT TCCCAACTTC CAGTTGGTAC TGGGAACAAA 1620 

CGTCCTGGTG ATCCAAAACA TAGTAGCCAG ACAAGCAACT TAGCACATAA AACCTATAGC 1680 

TTGTCTAGTT CTTTTTCCTC TTCCAGTTCT ACTCGTAAAA GGGGACCCTC TGAAGAGACT 174 0 

GGAGGGGCTG TGTTTGATCA TCCAGCCAAG ATTGCCAAGA GTACTAAATC CTCTTCCCTA 1800 

AATTTCTCCT TCCCTTCACT TCCTACAATG GGTCAGATGC CTGGGCATAG CTCAGACACA 1860 

AGTGGCCTTT CCTTTTCACA GCCCAGCTGT AAAACTCGTG TCCCTCATTC GAAACTGGAT 1920 

AAAGGGCCCA CTGGGGCCAA TGGTCACAAC ACGACCCAGA CAATAGACTA TCAAGACACT 1980 

GTGAATATGC TTCACTCCCT GCTCAGTGCC CAGGGTGTTC AGCCCACTCA GCCCACTGCA 2040 

TTTGAATTTG TTCGTCCTTA TAGTGACTAT CTGAATCCTC GGTCTGGTGG AATCTCCTCG 2100 

AGATCTGGCA ATACAGACAA ACCCCGGCCA CCACCTCTGC CATCAGAACC TCCTCCACCA 2160 

CTTCCACCCC TTCCTAAGTA A 2181 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 726 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 



Met Glu Gly Glu Arg 
1 5 

Glu Gin Leu Glu Asn 
20 

Lys Glu Leu Ser Tyr 
35 

Gly Gin Arg Leu Asn 
50 



Lys Asn Asn Asn Lys Arg 
10 

Ser Pro Ser Arg Arg Phe 
25 

Arg Gin Gin Ala Ala Asn 
40 

Val Ser Gin Leu Thr lie 
55 



Trp Tyr Phe Thr Arg 
15 

Gly Val Asp Pro Asp 
30 

Leu Leu Gin Asp Met 
45 

Asn Thr Ala lie Val 
60 



Tyr Met His Arg Phe Tyr Met He Gin Ser Phe Thr Arg Phe Pro Gly 
65 70 75 80 

Asn Ser Val Ala Pro Ala Ala Leu Phe Leu Ala Ala Lys Val Glu Glu 
85 90 95 



Gin Pro Lys Lys Leu Glu His Val He Lys Val Ala His Thr Cys Leu 
100 105 110 

His Pro Gin Glu Ser Leu Pro Asp Thr Arg Ser Glu Ala Tyr Leu Gin 
115 120 125 

Gin Val Gin Asp Leu Val He Leu Glu Ser He He Leu Gin Thr Leu 
130 135 140 

Gly Phe Glu Leu Thr He Asp His Pro His Thr His Val Val Lys Cys 
145 150 155 160 

Thr Gin Leu Val Arg Ala Ser Lys Asp Leu Ala Gin Thr Ser Tyr Phe 
165 170 175 

Met Ala Thr Asn Ser Leu His Leu Thr Thr Phe Ser Leu Gin Tyr Thr 
180 185 190 

Pro Pro Val Val Ala Cys Val Cys He His Leu Ala Cys Lys Trp Ser 
195 200 205 

Asn Trp Glu He Pro Val Ser Thr Asp Gly Lys His Trp Trp Glu Tyr 
210 215 220 

Val Asp Ala Thr Val Thr Leu Glu Leu Leu Asp Glu Leu Thr His Glu 
225 230 235 240 

Phe Leu Gin He Leu Glu Lys Thr Pro Asn Arg Leu Lys Arg He Trp 
245 250 255 
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Asn Trp Arg Ala Cys Glu Ala Ala Lys Lys Thr Lys ALa Asp Asp Arg 
260 265 270 



Gly Thr Asp Glu Lys Thr Ser Glu Gin Thr lie Leu Asn Met lie Ser 
275 280 285 

Gin Ser Ser Ser Asp Thr Thr lie Ala Gly Leu Met Ser Met Ser Thr 
290 295 300 

Ser Thr Thr Ser Ala Val Pro Ser Leu Pro Val Ser Glu Glu Ser Ser 
305 310 315 320 

Ser Asn Leu Thr Ser Val Glu Met Leu Pro Gly Lys Arg Trp Leu Ser 
325 330 335 

Ser Gin Pro Ser Phe Lys Leu Glu Pro Thr Gin Gly His Arg Thr Ser 
340 345 350 

Glu Asn Leu Ala Leu Thr Gly Val Asp His Ser Leu Pro Gin Asp Gly 
355 360 365 

Ser Asn Ala Phe lie Ser Gin Lys Gin Asn Ser Lys Ser Val Pro Ser 
370 375 380 

Ala Lys Val Ser Leu Lys Glu Tyr Arg Ala Lys His Ala Glu Glu Leu 
385 390 395 400 

Ala Ala Gin Lys Arg Gin Leu Glu Asn Met Glu Ala Asn Val Lys Ser 
405 410 415 

Gin Tyr Ala Tyr Ala Ala Gin Asn Leu Leu Ser His His Asp Ser His 
420 425 430 

Ser Ser Val lie Leu Lys Met Pro lie Glu Gly Ser Glu Asn Pro Glu 
435 440 445 

Arg Pro Phe Leu Glu Lys Ala Asp Lys Thr Ala Leu Lys Met Arg lie 
450 455 460 

Pro Val Ala Gly Gly Asp Lys Ala Ala Ser Ser Lys Pro Glu Glu He 
465 470 475 480 

Lys Met Arg He Lys Val His Ala Ala Ala Asp Lys His Asn Ser Val 
485 490 495 

Glu Asp Ser Val Thr Lys Ser Arg Glu His Lys Glu Glu Arg Lys Thr 
500 505 510 

His Pro Ser Asn His His His His His Asn His His Ser His Lys His 
515 520 525 

Ser His Ser Gin Leu Pro Val Gly Thr Gly Asn Lys Arg Pro Gly Asp 
530 535 540 
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Pro Lys His Ser Ser 
545 

Leu Ser Ser Ser Phe 
565 

Ser Glu Glu Thr Gly 
580 

Lys Ser Thr Lys Ser 
595 

Thr Met Gly Gin Met 
610 

Phe Ser Gin Pro Ser 
625 



Gin Thr Ser Asn Leu 
550 

Ser Ser Ser Ser Ser 
570 

Gly Ala Val Phe Asp 
585 

Ser Ser Leu Asn Phe 
600 

Pro Gly His Ser Ser 
615 

Cys Lys Thr Arg Val 
630 



Ala His Lys Thr Tyr Ser 
555 560 

Thr Arg Lys Arg Gly Pro 
575 

His Pro Ala Lys lie Ala 
590 

Ser Phe Pro Ser Leu Pro 
605 

Asp Thr Ser Gly Leu Ser 
620 

Pro His Ser Lys Leu Asp 
635 640 



Lys Gly Pro Thr Gly Ala Asn Gly 
645 

Tyr Gin Asp Thr Val Asn Met Leu 
660 

Val Gin Pro Thr Gin Pro Thr Ala 
675 680 

Asp Tyr Leu Asn Pro Arg Ser Gly 
690 695 

Thr Asp Lys Pro Arg Pro Pro Pro 
705 710 



His Asn Thr Thr Gin Thr He Asp 
650 655 

His Ser Leu Leu Ser Ala Gin Gly 
665 670 

Phe Glu Phe Val Arg Pro Tyr Ser 
685 

Gly He Ser Ser Arg Ser Gly Asn 
700 

Leu Pro Ser Glu Pro Pro Pro Pro 
715 720 



Leu Pro Pro Leu Pro Lys 
725 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 



TTCCCACCAA TGCTTTCC 



(2) INFORMATION FOR SEQ ID NO: 52: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52 
CCATCAGTTG ATACAGGGAT CT 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53 
GGAATTCAGA AGGTTGTAAG ATGC 



(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54 
ACACACAGAT GTGGTGAAAT GTACCCA 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55 
GCATCTTACA ACCTTCTG 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56 
GGAATTCATG GAAAGCATTG GTGGGAAT 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57 
CCTCCACTAC TGGTTTGCCT GG 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58 
GGACTAGTAT AAATATGGCG TCGGGCCGTG 



(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59 
GGAGATCTTA CATGTTCATT CCTTGGG 



(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60 
GGAGACAAGT ATGTGCTACC TTGATGACA 



(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61 
GGAATTCGGG CTGCTCCTCC ACTTTAG 



(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62 
GGAATTCGCT GCTGGAGCCA CAGAA 



(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63 
GTGTCACTGA AAGAATACCG 



(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 64 : 
GGAATTCAGG TGGAGATAAA GCTGC 



25 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:65: 
GCTCTAGATA AATATGGAGG GAGAGAGGAA 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
GGAATTCTTA CTTAGGAAGG GGTGGAAGTG 



(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GGAATTCTTA CTTAGGAAGG GGTGGAAGTG GTGGAGGAGG TTAC 



(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

Ala Cys Ser Tyr Ser Pro Thr Ser Pro Ser Tyr Ser Pro Thr Ser Pro 
15 10 15 

Ser Tyr Ser Pro Thr Ser Pro Ser Lys Lys 
20 25 



o 
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WHAT IS CLAIMED IS: 



1. A DNA segment comprising an isolated coding region that encodes a substantially mil 
length P-TEFb subunit, wherein the coding region is characterized as: 

(a) encoding a substantially full length P-TEFb kinase subunit having the amino acid 
sequence of SEQ ID NO:2; or 

(b) encoding a substantially full length P-TEFb large subunit that includes a 
contiguous sequence of at least about 7 amino acids from SEQ ID NO:4, SEQ ID 
NO:45, SEQ ID NO:47 or SEQ ID NO:50; or as a substantially full length 
coding region that hybridizes to the nucleotide sequence of SEQ ID NO:3, 
SEQ ID NO:43 or SEQ ID NO:48 under stringent hybridization conditions. 



2. The DNA segment of claim 1 , wherein said isolated coding region encodes a substantially 
full length P-TEFb kinase subunit having the amino acid sequence of SEQ ID NO:2. 



3. The DNA segment of claim 2, wherein said isolated coding region has the nucleotide 
sequence from position 115 to position 1327 of SEQ ID NO:l. 



4. The DNA segment of claim 1 , wherein said isolated coding region encodes a substantially 
full length P-TEFb large subunit that includes a contiguous sequence of at least about 7 amino 
acids from SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50; or is a 
substantially full length coding region that hybridizes to the nucleotide sequence of SEQ ID 
NO:3, SEQ ID NO:43 or SEQ ID NO:48 under stringent hybridization conditions. 
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5. The DNA segment of claim 1 , wherein said isolated coding region encodes a substantially 
full length P-TEFb large subunit that includes a contiguous sequence of at least about 7 amino 
acids from SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50 and wherein said 
coding region hybridizes to the nucleotide sequence of SEQ ID NO:3, SEQ ID NO:43 or 
5 SEQ ID NO:48 under stringent hybridization conditions. 



6. The DNA segment of claim 4, wherein said isolated coding region encodes a substantially 
full length P-TEFb large subunit that includes a contiguous sequence of at least about 7 amino 
10 acids from SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50. 



gj 7. The DNA segment of claim 6, wherein said isolated coding region encodes a P-TEFb 
fi large subunit having the amino acid sequence of SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 
M15 or SEQ ID NO:50. 



P 8 - The DNA segment of claim 7, wherein said isolated coding region encodes a P-TEFb 
M large subunit having the amino acid sequence of SEQ ID NO:45. 

9. The DNA segment of claim 7, wherein said isolated coding region encodes a P-TEFb 
large subunit having the amino acid sequence of SEQ ID NO:47. 

25 

10. The DNA segment of claim 7, wherein said isolated coding region encodes a P-TEFb 
large subunit having the amino acid sequence of SEQ ID NO:50. 
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11. The DNA segment of claim 4, wherein said isolated coding region is a substantially full 
length coding region that hybridizes to the nucleotide sequence of SEQ ID NO: 3, SEQ ID 
NO:43 or SEQ ID NO:48 under stringent hybridization conditions. 

12. The DNA segment of claim 11, wherein said isolated coding region has the nucleotide 
sequence of SEQ ID NO:44. 



13. The DNA segment of claim 11, wherein said isolated coding region has the nucleotide 
sequence of SEQ ID NO:46. 



14. The DNA segment of claim 11, wherein said isolated coding region has the nucleotide 
sequence of SEQ ID NO:49. 

15. The DNA segment of claim 1, wherein said DNA segment comprises a first coding 
region that encodes a substantially full length P-TEFb kinase subunit and a second coding region 
that encodes a substantially full length P-TEFb large subunit. 

16. The DNA segment of claim 15, wherein said second coding region encodes a P-TEFb 
large subunit that has the amino acid sequence of SEQ ID NO:45, SEQ ID NO:47 or SEQ ID 
NO:50. 



17. The DNA segment of claim 16, wherein said first coding region encodes a P-TEFb 
kinase subunit that has the amino acid sequence of SEQ ID NO:6. 
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18. The DNA segment of claim 16, wherein said second coding region has the nucleotide 
sequence of SEQ ED NO:44, SEQ ID NO:46 or SEQ ID NO:49, and wherein said first coding 
region has the nucleotide sequence of SEQ ID NO:5. 

19. The DNA segment of claim 1, wherein said isolated coding region is operatively attached 
to a second coding region that encodes a selected peptide or protein sequence, said DNA segment 
encoding a P-TEFb subunit fusion protein. 

20. The DNA segment of claim 1, operatively positioned under the control of a promoter. 

21 . The DNA segment of claim 20, further defined as a recombinant vector. 

22. The DNA segment of claim 20, comprised within a recombinant host cell. 

23 . An expression system comprising : 

(a) a first expression unit comprising, under the transcriptional control of a promoter, 
a first coding region that encodes a substantially fall length P-TEFb kinase 
subunit that includes a contiguous sequence of at least about 7 amino acids from 
SEQ ID NO:2 or SEQ ID NO:6; and 
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(b) a second expression unit comprising, under the transcriptional control of a 
promoter, a second coding region that encodes a substantially full length P-TEFb 
large subunit that includes a contiguous sequence of at least about 7 amino acids 
from SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50. 

24. The expression system of claim 23, wherein said first and said second expression units 
are comprised on a single expression vector. 

25. The expression system of claim 23, wherein said first and said second expression units 
are comprised on two distinct expression vectors. 

26. The expression system of claim 23, wherein said expression system is comprised within a 
recombinant host cell. 



27. A recombinant host cell comprising at least a first DNA segment in accordance with 
claim 1. 



28. The recombinant host cell of claim 27, wherein said cell is a prokaryotic host cell. 

29. The recombinant host cell of claim 27, wherein said cell is a eukaryotic host cell. 

30. The recombinant host cell of claim 27, wherein said cell further comprises an HIV Tat 
protein. 
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31. The recombinant host cell of claim 27, wherein said cell comprises a first DNA segment 
that encodes a substantially full length P-TEFb kinase subunit and a second DNA segment that 

5 encodes a substantially full length P-TEFb large subunit. 

32. The recombinant host cell of claim 31, wherein said first and second DNA segments are 
comprised within a single expression vector. 

10 

33. A method for detecting P-TEFb nucleic acids in a sample, comprising obtaining sample 
S nucleic acids from a sample suspected of containing P-TEFb nucleic acids; contacting said 
fi sample nucleic acids with a nucleic acid segment that hybridizes to the sequence of SEQ ID 
«5 NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:43 or SEQ ID NO:48, under conditions 
oi effective to allow hybridization of substantially complementary nucleic acids; and detecting the 
^ hybridized complementary nucleic acids thus formed. 

JpO 34. The method of claim 33, wherein the sample nucleic acids are obtained from a sample 
% i suspected of containing a tumor cell. 

35. A method of using a DNA segment that encodes a substantially full length P-TEFb 
25 subunit, comprising expressing at least a first DNA segment in accordance with claim 1 in a 
recombinant host cell and collecting the P-TEFb subunit expressed by said cell. 
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36. A composition comprising at least a first isolated, substantially full length P-TEFb 
protein subunit that includes a contiguous sequence of at least about 7 amino acids from SEQ ID 
NO:2, SEQ ID NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50. 



37. The composition of claim 36, wherein said substantially full length P-TEFb subunit is 
operatively attached to a selected peptide or protein sequence. 

38. The composition of claim 36, wherein said composition comprises a substantially full 
length P-TEFb kinase subunit in operative association with a substantially full length P-TEFb 
large subunit. 

39. The composition of claim 36, further comprising an HIV Tat protein. 

40. An isolated, functional P-TEFb enzyme complex comprising a P-TEFb kinase subunit in 
operative association with a P-TEFb large subunit. 

41 . A P-TEFb immunodetection reagent characterized as: 

(a) an antibody that has immunospecificity for a P-TEFb subunit that includes a 
contiguous sequence of at least about 7 amino acids from SEQ ID NO:2, SEQ ID 
NO:4, SEQ ID NO:45, SEQ ID NO:47 or SEQ ID NO:50; or 

(b) an antibody that has immunospecificity for a P-TEFb subunit that includes a 
contiguous sequence of at least about 7 amino acids from SEQ ID NO:6, the 
antibody operatively attached to a detectable label. 
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42. A method of identifying a gene that encodes a protein that interacts with a P-TEFb 
subunit, comprising the steps of: 

(a) obtaining a first DNA segment comprising a candidate gene; said first DNA 
segment expressing a first fusion protein comprising a transcriptional 
transactivating domain operatively attached to the candidate protein encoded by 
said candidate gene; 

(b) obtaining a second DNA segment that expresses a second fusion protein 
comprising a P-TEFb subunit operatively attached to a DNA binding domain that 
binds to a defined nucleic acid sequence; 

(c) providing said first and second DNA segments to a eukaryotic host cell that 
comprises a marker gene operatively positioned downstream of said defined 
nucleic acid sequence; and 

(d) identifying a eukaryotic host cell that expresses said marker gene, thereby 
identifying said candidate gene as a gene that encodes a protein that interacts with 
a P-TEFb subunit. 



43. A method for identifying a candidate transcriptional inhibitor, comprising preparing a 
P-TEFb composition comprising at least a P-TEFb kinase subunit and testing said candidate 
inhibitor for the ability to inhibit P-TEFb-mediated phosphorylation of RNA polymerase H, 
wherein inhibition of phosphorylation is indicative of a candidate transcriptional inhibitor. 

44. The method of claim 43, comprising the steps of: 
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(a) obtaining a P-TEFb composition comprising at least a P-TEFb kinase subunit; 

(b) obtaining an RNA polymerase II composition comprising at least the carboxyl 
terminal domain (CTD) of the large subunit of RNA polymerase II; 

(c) admixing said P-TEFb composition with said RNA polymerase II composition and 
an effective phosphate donor compound comprising a labeled phosphate group; 
and 

(d) determining the ability of said P-TEFb composition to transfer said labeled 
phosphate group to said RNA polymerase II composition in the presence of said 
candidate transcriptional inhibitor and in the absence of said candidate 
transcriptional inhibitor, wherein a reduction in the amount of labeled phosphate 
transferred to RNA polymerase II in the presence of said candidate is indicative of 
a candidate transcriptional inhibitor. 

45. The method of claim 43, wherein said P-TEFb composition comprises a P-TEFb enzyme 
complex that has transcription elongation promoting activity. 

46. The method of claim 45, further comprising testing the candidate transcriptional inhibitor 
so identified in a transcription elongation assay, wherein inhibition of transcription elongation 
confirms the identification of a transcriptional inhibitor. 

47. The method of claim 43, further comprising the step of purifying the candidate 
transcriptional inhibitor so identified. 
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48. A method for identifying an HIV Tat protein, comprising contacting a composition 
suspected of containing an HIV Tat protein with a human P-TEFb composition under conditions 
effective to allow the formation of bound protein complexes and detecting the bound Tat:P-TEFb 
5 protein complexes so formed. 



10 



49. A method for identifying a candidate viral transcription inhibitor, comprising testing a 
candidate substance for the ability to inhibit the binding of a viral transcriptional transactivator 
protein to at least one human P-TEFb subunit under effective binding conditions, wherein 
inhibition of binding is indicative of a candidate viral transcription inhibitor. 



50. The method of claim 49, wherein said viral transcriptional transactivator protein or said 
45 at least one human P-TEFb subunit is attached to a solid support. 



Z 51. The method of claim 49, wherein said effective binding conditions comprise admixing a 
human cell nuclear extract with said viral transcriptional transactivator protein and said at least 
L:20 one human P-TEFb subunit. 



52. The method of claim 49, wherein said inhibition of binding is tested by determining the 
ability of the candidate substance to inhibit the formation of viral protein-P-TEFb subunit 
25 complexes that are detected by means of a detectable label attached to the viral protein or to the 
P-TEFb subunit. 
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53. The method of claim 49, wherein said inhibition of binding is tested by determining the 
ability of the candidate substance to inhibit the formation of viral protein-P-TEFb subunit 
complexes that are detected by means of a first specific immunological detection reagent. 

54. The method of claim 53, wherein said inhibition of binding is tested by determining the 
ability of the candidate substance to inhibit the formation of viral protein-P-TEFb subunit 
complexes that are detected by means of a first and a second specific immunological detection 
reagent. 

55. The method of claim 49, further comprising testing the candidate viral transcription 
inhibitor so identified in a viral transcription elongation assay, wherein inhibition of viral 
transcription elongation confirms the identity of an active candidate viral transcription inhibitor. 

56. The method of claim 55, further comprising testing the active candidate viral 
transcription inhibitor next identified in separate human and viral transcription elongation assays, 
wherein inhibition of viral transcription elongation, and not human transcription elongation, in 
the presence of said viral transcriptional transactivator protein confirms the identification of an 
active viral transcription inhibitor. 

57. The method of claim 49, further comprising the step of purifying the candidate viral 
transcriptional inhibitor so identified. 

58. The method of claim 57, wherein said purified candidate viral transcriptional inhibitor is 
formulated in a pharmaceutically acceptable vehicle. 
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59. A method for identifying a candidate viral transcription inhibitor, comprising testing a 
candidate substance for the ability to inhibit viral RNA elongation in a functional viral 
transcription elongation assay, wherein inhibition of viral RNA elongation is indicative of an 
active candidate viral transcription inhibitor. 

60. The method of claim 59, further comprising testing the active candidate viral 
transcription inhibitor in distinct human and viral transcription elongation assays, wherein the 
inhibition of viral, but not human, transcription elongation confirms the identity of an active 
candidate viral transcription inhibitor. 

61. The method of claim 60, wherein said method comprises the steps of: 

(a) preparing a first transcriptionally competent composition capable of generating 
elongated human RNA transcripts, said composition comprising effective amounts 
of human nucleic acid template, P-TEFb enzyme complex, RNA polymerase n, 
the required nucleotide triphosphates and ATP; 

(b) preparing a second transcriptionally competent composition capable of generating 
elongated viral RNA transcripts, said composition comprising effective amounts 
of viral nucleic acid template, viral transcriptional transactivator protein, P-TEFb 
enzyme complex, RNA polymerase E, the required nucleotide triphosphates and 
ATP; and 

(c) identifying a viral transcription inhibitor that inhibits the generation of elongated 
viral RNA transcripts by said second transcriptionally competent composition but 
that does not inhibit the generation of elongated human RNA transcripts by said 
first transcriptionally competent composition. 
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62. The method of claim 61, wherein said candidate viral transcription inhibitor is a 
candidate HIV transcription inhibitor and wherein said method comprises the steps of: 

5 

(a) preparing a first transcriptionally competent composition capable of generating 
elongated human RNA transcripts, said composition comprising effective amounts 
of human nucleic acid template, P-TEFb enzyme complex, RNA polymerase II, 
the required nucleotide triphosphates and ATP; 

10 

(b) preparing a second transcriptionally competent composition capable of generating 
elongated HIV RNA transcripts, said composition comprising effective amounts 
of HIV nucleic acid template, HIV Tat protein, P-TEFb enzyme complex, RNA 
polymerase n, the required nucleotide triphosphates and ATP; and 

■15 

(c) identifying an HIV inhibitor that inhibits the generation of elongated HIV RNA 
transcripts by said second transcriptionally competent composition but that does 
not inhibit the generation of elongated human RNA transcripts by said first 
transcriptionally competent composition. 

;20 

63. The method of claim 59, further comprising the step of purifying the candidate viral 
transcriptional inhibitor so identified. 

25 

64. The method of claim 63, wherein said purified candidate viral transcriptional inhibitor is 
formulated in a pharmaceutical^ acceptable vehicle. 



A: 123701 (2NGS01I.DOC) 



-292- 



65. A method for inhibiting viral replication, comprising contacting a cell suspected of being 
infected with a virus with a biologically effective amount of a viral transcription inhibitor 
identified by the method of claim 59. 

5 

66. The method of claim 65, wherein said cell is a cell suspected of being infected with HIV. 

67. The method of claim 65, wherein said cell is located within an animal and a 
1 0 therapeutically effective amount of said inhibitor is administered to said animal. 
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ABSTRACT 



Disclosed is the discovery that the transcription elongation factor termed P-TEFb has a 
central role in transcription elongation control. P-TEFb is herein shown to phosphorylate RNA 
5 polymerase II and to control the transition from abortive into productive elongation mode. 
P-TEFb has also been discovered to interact with the HIV transcriptional transactivating protein, 
Tat, showing that P-TEFb is the cellular factor necessary for HIV Tat to effect productive viral 
mRNA elongation. The invention provides genes encoding P-TEFb subunits, including human 
genes, and related biological components, and also provides assay methods connected with the 
10 control of transcription elongation. Particularly useful assays are those concerning the 
identification of substances that inhibit viral replication at the transcription elongation stage by 
inhibiting the binding or functional interaction of viral proteins to P-TEFb. 
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SEQUENCE LISTING ^<o 

• CO 

. *— -f 

(1) GENERAL INFORMATION: t^^l 

(i) APPLICANT: Price, David H. g 

(ii) TITLE OF INVENTION: P-TEFb COMPOSITIONS, METHODS AND 
SCREENING ASSAYS 

(iii) NUMBER OF SEQUENCES: 68 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Arnold, White & Durkee 

(B) STREET: P.O. Box 4433 

(C) CITY: Houston 

(D) STATE: TX 

(E) COUNTRY: USA 

(F) ZIP: 77210-4433 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.3 0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US Unknown 

(B) FILING DATE: Concurrently Herewith 

(C) CLASSIFICATION: Unknown 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Fussey, Shelley P.M. 

(B) REGISTRATION NUMBER: 39,458 

(C) REFERENCE /DOCKET NUMBER: IOWA: 012 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (512) 418-3000 

(B) TELEFAX: (512) 418-3131 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1457 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 115.. 1326 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 



TGTTGAGTCA ACAGCTGTAG ATACACCAAT TGTTGCCGAT TTCTTTCTTT TCGACTGTCG 60 



GCTTCTCGCG AAACTGTGAT TGTGAAAATT GTACAAATAG AGGCAAATTT AACC ATG 117 

Met 
1 

GCG CAC ATG TCC CAC ATG CTC CAG CAG CCT TCG GGG TCG ACG CCC TCC 165 
Ala His Met Ser His Met Leu Gin Gin Pro Ser Gly Ser Thr Pro Ser 
5 10 15 

AAC GTG GGC TCC AGC TCA TCG CGC ACG ATG TCC CTG ATG GAG AAA CAA 213 
Asn Val Gly Ser Ser Ser Ser Arg Thr Met Ser Leu Met Glu Lys Gin 
20 25 30 

AAG TAC ATC GAG GAC TAC GAC TTT CCC TAC TGC GAC GAG AGC AAC AAA 261 
Lys Tyr lie Glu Asp Tyr Asp Phe Pro Tyr Cys Asp Glu Ser Asn Lys 
35 40 45 

TAC GAA AAG GTG GCG AAA ATT GGC CAA GGC ACC TTC GGA GAG GTT TTT 3 09 

Tyr Glu Lys Val Ala Lys He Gly Gin Gly Thr Phe Gly Glu Val Phe 
50 55 60 65 

AAG GCT CGC GAG AAA AAG GGC AAC AAG AAG TTT GTG GCC ATG AAG AAG 3 57 

Lys Ala Arg Glu Lys Lys Gly Asn Lys Lys Phe Val Ala Met Lys Lys 
70 75 80 

GTG CTG ATG GAC AAC GAA AAG GAG GGC TTT CCC ATC ACG GCT CTG CGA 405 
Val Leu Met Asp Asn Glu Lys Glu Gly Phe Pro He Thr Ala Leu Arg 
85 90 95 

GAG ATC CGC ATC CTG CAG CTG CTA AAG CAC GAG AAC GTG GTG AAT CTG 453 
Glu He Arg He Leu Gin Leu Leu Lys His Glu Asn Val Val Asn Leu 
100 105 110 

ATC GAG ATC TGC CGC ACC AAG GCC ACC GCC ACG AAT GGT TAC AGA TCC 501 
He Glu He Cys Arg Thr Lys Ala Thr Ala Thr Asn Gly Tyr Arg Ser 
115 120 125 

ACC TTC TAT TTG GTC TTT GAT TTC TGC GAA CAC GAT TTG GCA GGT CTT 549 
Thr Phe Tyr Leu Val Phe Asp Phe Cys Glu His Asp Leu Ala Gly Leu 
130 135 140 145 

CTG TCC AAC ATG AAC GTC AAG TTC AGT CTG GGC GAG ATT AAG AAG GTT 597 
Leu Ser Asn Met Asn Val Lys Phe Ser Leu Gly Glu He Lys Lys Val 
150 155 160 

ATG CAG CAG CTT TTA AAC GGT TTG TAT TAC ATC CAC AGC AAC AAG ATC 645 
Met Gin Gin Leu Leu Asn Gly Leu Tyr Tyr He His Ser Asn Lys He 
165 170 175 

CTG CAC CGA GAC ATG AAA GCT GCC AAC GTG CTG ATT ACC AAG CAT GGC 6 93 

Leu His Arg Asp Met Lys Ala Ala Asn Val Leu He Thr Lys His Gly 
180 185 190 



ATC TTA AAG CTG GCT GAC TTT GGC TTG GCC CGT GCT TTT AGC ATT CCA 741 
lie Leu Lys Leu Ala Asp Phe Gly Leu Ala Arg Ala Phe Ser lie Pro 
195 200 205 

AAG AAC GAG AGT AAG AAT CGC TAT ACC AAT CGC GTA GTA ACC TTG TGG 789 
Lys Asn Glu Ser Lys Asn Arg Tyr Thr Asn Arg Val Val Thr Leu Trp 
210 215 220 225 

TAC CGG CCG CCT GAG CTG CTA CTT GGT GAC CGC AAC TAT GGT CCA CCC 837 
Tyr Arg Pro Pro Glu Leu Leu Leu Gly Asp Arg Asn Tyr Gly Pro Pro 
230 235 240 

GTG GAC ATG TGG GGA GCC GGC TGC ATA ATG GCC GAG ATG TGG ACA CGC 885 
Val Asp Met Trp Gly Ala Gly Cys lie Met Ala Glu Met Trp Thr Arg 
245 250 255 

TCG CCC ATC ATG CAA GGC AAT ACG GAG CAG CAG CAG TTA ACC TTT ATT 933 
Ser Pro lie Met Gin Gly Asn Thr Glu Gin Gin Gin Leu Thr Phe lie 
260 265 270 

TCG CAG CTA TGC GGC TCC TTT ACG CCG GAC GTG TGG CCG GGA GTG GAG 981 
Ser Gin Leu Cys Gly Ser Phe Thr Pro Asp Val Trp Pro Gly Val Glu 
275 280 285 

GAG CTG GAG CTG TAC AAA TCC ATC GAG CTG CCA AAG AAC CAG AAG CGT 102 9 

Glu Leu Glu Leu Tyr Lys Ser lie Glu Leu Pro Lys Asn Gin Lys Arg 
290 295 300 305 

CGA GTC AAG GAG CGC CTG CGT CCG TAT GTC AAG GAT CAA ACC GGC TGT 1077 
Arg Val Lys Glu Arg Leu Arg Pro Tyr Val Lys Asp Gin Thr Gly Cys 
310 315 320 

GAT CTA TTG GAC AAA TTG CTG ACC CTT GAT CCC AAG AAA CGC ATC GAT 1125 
Asp Leu Leu Asp Lys Leu Leu Thr Leu Asp Pro Lys Lys Arg lie Asp 
325 330 335 

GCG GAC ACA GCT CTG AAT CAC GAC TTC TTC TGG ACG GAT CCC ATG CCC 1173 
Ala Asp Thr Ala Leu Asn His Asp Phe Phe Trp Thr Asp Pro Met Pro 
340 345 350 

AGC GAC TTG AGC AAG ATG CTG TCC CAG CAC CTG CAG AGC ATG TTC GAG 1221 
Ser Asp Leu Ser Lys Met Leu Ser Gin His Leu Gin Ser Met Phe Glu 
355 360 365 

TAC CTG GCG CAG CCA CGC CGC AGC AAC CAG ATG CGC AAC TAT CAC CAG 1269 
Tyr Leu Ala Gin Pro Arg Arg Ser Asn Gin Met Arg Asn Tyr His Gin 
370 375 380 385 

CAA CTG ACC ACC ATG AAC CAG AAG CCC CAG GAC AAC AGT ATG ATT GAC 1317 
Gin Leu Thr Thr Met Asn Gin Lys Pro Gin Asp Asn Ser Met lie Asp 
390 395 400 

CGG GTT TGG TAGACTGCCA GAGGTGTACG CACCCGACTA ATAGTTTCTC 1366 



Arg Val Trp 



ACCTTCAACT AGCGTTAGGT TATTAGGTTA GTGTACAATA AAAATATTGG CATTTGCATT 1426 
AGCGCTTGCT CCAAATATAA AAAAAAAAAA A 1457 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 404 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Ala His Met Ser His Met Leu Gin Gin Pro Ser Gly Ser Thr Pro 
15 10 15 

Ser Asn Val Gly Ser Ser Ser Ser Arg Thr Met Ser Leu Met Glu Lys 
20 25 30 

Gin Lys Tyr lie Glu Asp Tyr Asp Phe Pro Tyr Cys Asp Glu Ser Asn 
35 40 45 

Lys Tyr Glu Lys Val Ala Lys He Gly Gin Gly Thr Phe Gly Glu Val 
50 55 60 

Phe Lys Ala Arg Glu Lys Lys Gly Asn Lys Lys Phe Val Ala Met Lys 
65 70 75 80 

Lys Val Leu Met Asp Asn Glu Lys Glu Gly Phe Pro He Thr Ala Leu 
85 90 95 

Arg Glu He Arg He Leu Gin Leu Leu Lys His Glu Asn Val Val Asn 
100 105 110 

Leu He Glu He Cys Arg Thr Lys Ala Thr Ala Thr Asn Gly Tyr Arg 
115 120 125 

Ser Thr Phe Tyr Leu Val Phe Asp Phe Cys Glu His Asp Leu Ala Gly 
130 135 140 

Leu Leu Ser Asn Met Asn Val Lys Phe Ser Leu Gly Glu He Lys Lys 
145 150 155 160 

Val Met Gin Gin Leu Leu Asn Gly Leu Tyr Tyr He His Ser Asn Lys 
165 170 175 

He Leu His Arg Asp Met Lys Ala Ala Asn Val Leu He Thr Lys His 
180 185 190 



Gly lie Leu Lys Leu Ala Asp Phe Gly Leu Ala Arg Ala Phe Ser lie 
195 200 205 



Pro Lys Asn Glu Ser Lys Asn Arg Tyr Thr Asn Arg Val Val Thr Leu 
210 215 220 

Trp Tyr Arg Pro Pro Glu Leu Leu Leu Gly Asp Arg Asn Tyr Gly Pro 
225 230 235 240 

Pro Val Asp Met Trp Gly Ala Gly Cys lie Met Ala Glu Met Trp Thr 
245 250 255 

Arg Ser Pro lie Met Gin Gly Asn Thr Glu Gin Gin Gin Leu Thr Phe 
260 265 270 

lie Ser Gin Leu Cys Gly Ser Phe Thr Pro Asp Val Trp Pro Gly Val 
275 280 285 

Glu Glu Leu Glu Leu Tyr Lys Ser lie Glu Leu Pro Lys Asn Gin Lys 
290 295 300 

Arg Arg Val Lys Glu Arg Leu Arg Pro Tyr Val Lys Asp Gin Thr Gly 
305 310 315 320 

Cys Asp Leu Leu Asp Lys Leu Leu Thr Leu Asp Pro Lys Lys Arg lie 
325 330 335 

Asp Ala Asp Thr Ala Leu Asn His Asp Phe Phe Trp Thr Asp Pro Met 
340 345 350 

Pro Ser Asp Leu Ser Lys Met Leu Ser Gin His Leu Gin Ser Met Phe 
355 360 365 

Glu Tyr Leu Ala Gin Pro Arg Arg Ser Asn Gin Met Arg Asn Tyr His 
370 375 380 

Gin Gin Leu Thr Thr Met Asn Gin Lys Pro Gin Asp Asn Ser Met He 
385 390 395 400 

Asp Arg Val Trp 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 432 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 



CAGCCCTGCC 


GACGGCCATA 


CTTGAAAATA 


CATTTTTTTC 


TGCAAAGTTT 


GTCATTGTCA 


60 


CTGTGTGAAT 


GGAATCTGTG 


ATGTGTTGTG 


GAATTAAAAA 


CGTCAAGTAA 


ACAACCCGTA 


120 


ATGGTTAAAG 


TGCACGGCGA 


AAGCAGTGCG 


AATAACTATG 


AATTGATACA 


AAAGTTGCAT 


180 


AACACGTCGC 


CTGGTGTCGC 


GGTTAGTGTG 


TTTTTCGTCT 


CGTTTCGTTT 


CCGCCGCAGT 


240 


CGCAGTTTCC 


AAAAAACCTC 


ACCACACCAT 


ACCATCTCCA 


CCACGCACAC 


ACACACACAA 


300 


ACAAACACGC 


AGAGACGCGG 


CGGCGGAAAA 


AGTGTGCGGA 


CCGCGGATTT 


AACCCCTCGT 


360 


TCCAAACCCA 


AATTGGAGTC 


TCCCAAAAAC 


AGCGAAATAT 


CGAGTGTGGC 


TTAGCCGATG 


420 


TGCCGTGCGA 


TCCCCACTGC 


CCCTTCCGTA 


CCGCTGCCAC 


CCCCGCCACA 


GCAGCAACGC 


480 


ACACGGATAC 


GGACACAGAC 


ACCAATACCA 


GCGCACTCAA 


GCACGGCCGA 


CAAAGAAAGA 


540 


GCGCTCTCCC 


TTCCTCTTTG 


TACAGTTAGT 


TCCTACAGCT 


GAATCAGCCA 


AAAGAAATTA 


600 


CTAGGTCCAT 


TCCGAGGCGC 


AGTTTGCATG 


TGAAACGGAG 


GTCCCCGCAT 


AACCACGCGG 


660 


AACCCGAAAT 


TCCAGATCCC 


CATCTCCGCT 


GCACGGATAA 


AGGAAACATA 


CAACCATGAG 


720 


TCTCCTAGCC 


ACGCCAATGC 


CCCAGGCGGC 


CACCGCCTCA 


TCTTCTTCAT 


CCGCCTCCGC 


780 


GGCCGCCTCG 


GCCAGCGGGA 


TTCCAATCAC 


CGCCAACAAC 


AACCTGCCTT 


TCGAGAAGGA 


840 


CAAGATCTGG 


TACTTCAGCA 


ACGATCAGCT 


GGCCAATTTG 


CCAAGCAGAA 


GATGCGGCAT 


900 


CAAGGGCGAC 


GATGAGCTGC 


AGTACCGCCA 


GATGACCGCC 


TATCTGATAC 


AGGAAATGGG 


960 


TCAGCGTCTG 


CAGGTGTCCC 


AACTGTGCAT 


CAACACGGCC 


ATTGTGTACA 


TGCATCGGTT 


1020 


CTACGCCTTT 


CACTCCTTCA 


CCCACTTTCA 


TCGCAACTCC 


ATGGCGTCGG 


CGAGCCTCTT 


1080 


CTTGGCCGCC 


AAGGTAGAAG 


AGCAACCGCG 


GAAGCTGGAG 


CATGTTATTC 


GGGCCGCCAA 


1140 


CAAGTGCCTG 


CCGCCGACCA 


CCGAGCAGAA 


TTACGCCGAA 


CTCGCCCAGG 


AGCTTGTGTT 


1200 


CAACGAGAAC 


GTGCTCCTGC 


AGACGCTGGG 


CTTCGATGTG 


GCCATCGATC 


ATCCGCACAC 


1260 


GCATGTGGTG 


CGCACCTGCC 


AGCTGGTCAA 


AGCATGCAAG 


GATCTGGCGC 


AGACATCGTA 


1320 


CTTCTTGGCC 


TCGAACAGCC 


TGCATCTGAC 


CTCGATGTGC 


CTCCAATATC 


GCCCCACGGT 


1380 


CGTAGCCTGT 


TTCTGCATTT 


ACCTAGCCTG 


CAAGTGGTCC 


CGATGGGAGA 


TCCCCCAGTC 


1440 


GACCGAGGGC 


AAGCACTGGT 


TCTACTATGT 


GGACAAGACG 


GTCTCGCTGG 


ATTTGCTAAA 


1500 


GCAGCTGACA 


GATGAGTTCA 


TCGCTATCTA 


TGAGAAGAGC 


CCGGCCCGTC 


TGAAGTCTAA 


1560 


GCTTAACTCG 


ATCAAGGCGA 


TCGCCCAGGG 


AGCCAGCAAT 


CGGACAGCTA 


ACAGCAAGGA 


1620 



CAAACCAAAG 


GAGGACTGGA 


AGATCACCGA 


GATGATGAAG 


GGCTACCACT 


CAAACATCAC 


1680 


GACACCACCA 


GAGCTGTTAA 


ACGGCAACGA 


CAGCCGGGAT 


CGGGACCGAG 


ATCGTGAACG 


1740 


GGAGAGAGAG 


CGGGAACGGG 


ATCCGTCGTC 


ACTACTGCCG 


CCACCGGCTA 


TGGTGCCGCA 


1800 


GCAAAGACGA 


CAGGATGGTG 


GACATCAGCG 


CTCGTCCTCA 


GTGAGCGGAG 


TGCCAGGCAG 


1860 


CAGCTCTTCG 


TCGTCTTCCT 


CCAGTCACAA 


GATGCCAAAT 


TACCCTGGTG 


GCATGCCGCC 


1920 


CGAAGCTCAT 


CCGGATCACA 


AGTCAAAGCA 


GCCGGGCTAT 


AACAATCGAA 


TGCCCTCAAG 


1980 


TCACCAGCGT 


AGTAGTAGCA 


GTGGACTCGG 


TTCCTCGGGA 


AGTGGCAGCC 


AGCACAGCAG 


2040 


CTCATCCTCG 


TCGTCTTCAA 


GCCAGCAGCC 


TGGCCGACCG 


TCTATGCCCG 


TGGACTATCA 


2100 


CAAATCCTCT 


CGCGGCATGC 


CGCCGGTAGG 


CGTGGGCATG 


CCACCTCACG 


GCAGCCACAA 


2160 


GATGACTTCG 


GGCTCCAAGC 


CTCAACAGCC 


GCAGCAGCAG 


CCGGTCCCAC 


ATCCATCCGC 


2220 


CTCTAATTCC 


TCTGCATCGG 


GCATGTCCTC 


CAAGGATAAA 


TCCCAGAGCA 


ACAAAATGTA 


2280 


TCCGAACGCA 


CCGCCGCCAT 


ACAGTAATAG 


TGCCCCTCAA 


AACCCGCTGA 


TGTCGCGTGG 


2340 


TGGATATCCA 


GGCGCTAGCA 


ATGGATCCCA 


GCCCCCGCCT 


CCCGCCGGAT 


ACGGCGGCCA 


2400 


TCGCAGCAAA 


TCCGGCTCCA 


CCGTCCATGG 


CATGCCGCAT 


TTCGAGCAGC 


AATTGCCCTA 


2460 


TTCCCAGAGC 


CAGAGCTACG 


GCCACATGCA 


GCAGCAGCCA 


GTGCCTCAGT 


CTCAGCAGCA 


2520 


ACAGATGCCT 


CCGGAGGCAT 


CCCAGCACTC 


GTTGCAGTCC 


AAGAACTCGC 


TCTTCAGTCC 


2580 


AGAGTGGCCA 


GACATTAAAA 


AGGAGCCCAT 


GTCGCAGTCG 


CAACCACAGC 


TTTTTAACGG 


2640 


TTTGCTACCC 


CCTCCTGCGC 


CTCCCGGCCA 


CGATTACAAG 


CTAAATAGCC 


ATCCGCGCGA 


2700 


CAAAGAAAGT 


CCCAAGAAAG 


AGCGACTAAC 


GCCAACCAAA 


AAGGATAAGC 


ACCGTCCTGT 


2760 


AATGCCCCCA 


ATGGGCAGTG 


GGAACAGTTC 


CTCCGGCTCG 


GGATCATCAA 


AGCCGATGCT 


2820 


ACCGCCTCAC 


AAGAAGCAGA 


TACCCCATGG 


CGGGGACCTG 


TTGACCAATC 


CTGGAGAGAG 


2880 


TGGAAGCCTA 


AAACGGCCCA 


ACGAGATCTC 


GGGAAGTCAG 


TATGGACTAA 


ATAAGCTGGA 


2940 


TGAAATAGAT 


AACAGTAATA 


TGCCTCGAGA 


AAAGCTTCGC 


AAGCTGGACA 


CTACAACTGG 


3000 


ACTACCAACT 


TATCCGAATT 


ATGAGGAGAA 


ACACACGCCT 


CTGAATATGT 


CCAACGGAAT 


3060 


CGAGACAACG 


CCGGATCTGG 


TGCGCAGTTT 


GCTAAAGGAG 


AGTCTGTGTC 


CATCGAACGC 


3120 


TTCGCTCCTG 


AAACCGGATG 


CCTTGACTAT 


GCCTGGCCTG 


AAACCACCGG 


CCGAACTACT 


3180 


TGAGCCCATG 


CCCGCACCAG 


CGACAATCAA 


GAAAGAACAG 


GGAATAACTC 


CGATGACCAG 


3240 



TTTGGCTAGT GGGCCCGCAC CCATGGATTT GGAAGTACCC ACTAAACAGG CCGGAGAGAT 3300 

TAAGGAGGAA AGCAGCAGCA AGTCCGAAAA GAAAAAGAAG AAGGATAAAC ACAAACACAA 3360 

GGAGAAGGAC AAGTCCAAGG ACAAGACGGA AAAGGAGGAG CGTAAGAAGC ACAAGAGGGA 3420 

CAAGCAGAAG GATCGTAGCG GCAGCGGTGG CAGCAAGGAC AGTTCTCTTC CCAATGAGCC 3480 

TCTGAAGATG GTTATCAAGA ATCCCAACGG CAGCCTGCAG GCCGGTGCGT CAGCTCCCAT 3540 

TAAACTTAAG ATCAGCAAAA ATAAGGTTGA ACCCAATAAC TACTCTGCAG CGGCGGGTCT 3600 

GCCTGGCGCA ATCGGATATG GCTTGCCTCC AACTACGGCT ACCACCACAT CCGCTTCGAT 3660 

CGGAGCAGCT GCTCCTGTTC TGCCTCCTTA TGGTGCCGGC GGTGGTGGCT ACAGCTCATC 3720 

GGGCGGCAGC AGTTCCGGTG GCAGCAGCAA GAAAAAGCAC AGCGATCGTG ACCGCGACAA 3780 

GGAGAGCAAA AAGAATAAGA GCCAAGACTA CGCGAAGTAC AATGGCGCTG GTGGCGGCAT 3840 

CTTTAATCCC CTTGGCGGTG CTGGCGCCGC AC CCAATATG TCTGGAGGAA TGGGCGCCCC 3900 

CATGTCTACT GCTGTACCAC CATCCATGCT GTTGGCGCCC ACCGGTGCAG TACCACCCTC 3 960 

TGCCGCTGGG CTGGCACCGC CTCCCATGCC CGTCTACAAC AAGAAGTAGT GGTAGCGGTC 402 0 

AGAGGGTTAT TCTTAAGTCG TACGTTTTGA TATATGTATA GAACCTCAGT AAGTCCGATT 4080 

GTAGTATAGT TGTTAGGATT GTTAGTGAGA TGCATTATTG ATTTTAGTTA AGCACATAGA 4140 

TAAAACTCCA AATTGGAAGT GAAACCGGAT GCGCAGATCG AAGAAGAATG GAAGTAGATG 4200 

TCGCGATGGG GCTGGACGTA AAAGCAGTAC TCAAATCGCG AAAACTTTTG TACAGCATTA 4260 

ATTAGTTTAT AACTATAATA AATAGCATAC ATATAAGCCC AAAAAAAAAA AAAAAAAAAA 4320 

AAAAAAAA 4328 

(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1097 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Met Ser Leu Leu Ala Thr Pro Met Pro Gin Ala Ala Thr Ala Ser Ser 
15 10 15 

Ser Ser Ser Ala Ser Ala Ala Ala Ser Ala Ser Gly He Pro He Thr 
20 25 30 



Ala Asn Asn Asn Leu Pro Phe Glu Lys Asp Lys lie Trp Tyr Phe Ser 
35 40 45 



Asn Asp Gin Leu Ala Asn Leu Pro Ser Arg Arg Cys Gly lie Lys Gly 
50 55 60 

Asp Asp Glu Leu Gin Tyr Arg Gin Met Thr Ala Tyr Leu lie Gin Glu 
65 70 75 80 

Met Gly Gin Arg Leu Gin Val Ser Gin Leu Cys He Asn Thr Ala He 
85 90 95 

Val Tyr Met His Arg Phe Tyr Ala Phe His Ser Phe Thr His Phe His 
100 105 no 

Arg Asn Ser Met Ala Ser Ala Ser Leu Phe Leu Ala Ala Lys Val Glu 
H5 120 125 

Glu Gin Pro Arg Lys Leu Glu His Val He Arg Ala Ala Asn Lys Cys 
130 135 140 

Leu Pro Pro Thr Thr Glu Gin Asn Tyr Ala Glu Leu Ala Gin Glu Leu 
I 45 150 155 160 

Val Phe Asn Glu Asn Val Leu Leu Gin Thr Leu Gly Phe Asp Val Ala 
165 170 175 

He Asp His Pro His Thr His Val Val Arg Thr Cys Gin Leu Val Lys 
180 185 190 

Ala Cys Lys Asp Leu Ala Gin Thr Ser Tyr Phe Leu Ala Ser Asn Ser 
195 200 205 

Leu His Leu Thr Ser Met Cys Leu Gin Tyr Arg Pro Thr Val Val Ala 
210 215 220 

Cys Phe Cys He Tyr Leu Ala Cys Lys Trp Ser Arg Trp Glu He Pro 
225 230 235 240 

Gin Ser Thr Glu Gly Lys His Trp Phe Tyr Tyr Val Asp Lys Thr Val 
245 250 255 

Ser Leu Asp Leu Leu Lys Gin Leu Thr Asp Glu Phe He Ala He Tyr 
260 265 270 

Glu Lys Ser Pro Ala Arg Leu Lys Ser Lys Leu Asn Ser He Lys Ala 
275 280 285 

He Ala Gin Gly Ala Ser Asn Arg Thr Ala Asn Ser Lys Asp Lys Pro 
2 $° 295 300 

Lys Glu Asp Trp Lys He Thr Glu Met Met Lys Gly Tyr His Ser Asn 
305 310 315 320 



He Thr Thr Pro Pro Glu Leu Leu Asn Gly Asn Asp Ser Arg Asp Arg 
325 330 335 



Asp Arg Asp Arg Glu Arg Glu Arg Glu Arg Glu Arg Asp Pro Ser Ser 
340 345 350 

Leu Leu Pro Pro Pro Ala Met Val Pro Gin Gin Arg Arg Gin Asp Gly 
355 360 365 

Gly His Gin Arg Ser Ser Ser Val Ser Gly Val Pro Gly Ser Ser Ser 
370 375 380 

Ser Ser Ser Ser Ser Ser His Lys Met Pro Asn Tyr Pro Gly Gly Met 
385 390 395 400 

Pro Pro Glu Ala His Pro Asp His Lys Ser Lys Gin Pro Gly Tyr Asn 
405 410 415 

Asn Arg Met Pro Ser Ser His Gin Arg Ser Ser Ser Ser Gly Leu Gly 
420 425 430 

Ser Ser Gly Ser Gly Ser Gin His Ser Ser Ser Ser Ser Ser Ser Ser 
435 440 445 

Ser Gin Gin Pro Gly Arg Pro Ser Met Pro Val Asp Tyr His Lys Ser 
450 455 460 

Ser Arg Gly Met Pro Pro Val Gly Val Gly Met Pro Pro His Gly Ser 
465 470 475 480 

His Lys Met Thr Ser Gly Ser Lys Pro Gin Gin Pro Gin Gin Gin Pro 
485 490 495 

Val Pro His Pro Ser Ala Ser Asn Ser Ser Ala Ser Gly Met Ser Ser 
500 505 510 

Lys Asp Lys Ser Gin Ser Asn Lys Met Tyr Pro Asn Ala Pro Pro Pro 
515 520 525 

Tyr Ser Asn Ser Ala Pro Gin Asn Pro Leu Met Ser Arg Gly Gly Tyr 
530 535 540 

Pro Gly Ala Ser Asn Gly Ser Gin Pro Pro Pro Pro Ala Gly Tyr Gly 
54 5 550 555 560 

Gly His Arg Ser Lys Ser Gly Ser Thr Val His Gly Met Pro His Phe 
565 570 575 

Glu Gin Gin Leu Pro Tyr Ser Gin Ser Gin Ser Tyr Gly His Met Gin 
580 585 590 



Gin Gin Pro Val Pro Gin Ser Gin Gin Gin Gin Met Pro Pro Glu Ala 
595 600 . 605 



Ser Gin His Ser Leu Gin Ser Lys Asn Ser Leu Phe Ser Pro Glu Trp 
610 615 620 



Pro Asp He Lys Lys Glu Pro Met Ser Gin Ser Gin Pro Gin Leu Phe 
625 630 635 640 

Asn Gly Leu Leu Pro Pro Pro Ala Pro Pro Gly His Asp Tyr Lys Leu 
645 650 655 

Asn Ser His Pro Arg Asp Lys Glu Ser Pro Lys Lys Glu Arg Leu Thr 
660 665 670 

Pro Thr Lys Lys Asp Lys His Arg Pro Val Met Pro Pro Met Gly Ser 
675 680 685 

Gly Asn Ser Ser Ser Gly Ser Gly Ser Ser Lys Pro Met Leu Pro Pro 
690 695 700 

His Lys Lys Gin He Pro His Gly Gly Asp Leu Leu Thr Asn Pro Gly 
705 710 715 720 

Glu Ser Gly Ser Leu Lys Arg Pro Asn Glu He Ser Gly Ser Gin Tyr 
725 730 735 

Gly Leu Asn Lys Leu Asp Glu He Asp Asn Ser Asn Met Pro Arg Glu 
740 745 750 

Lys Leu Arg Lys Leu Asp Thr Thr Thr Gly Leu Pro Thr Tyr Pro Asn 
755 760 765 

Tyr Glu Glu Lys His Thr Pro Leu Asn Met Ser Asn Gly He Glu Thr 
770 775 780 

Thr Pro Asp Leu Val Arg Ser Leu Leu Lys Glu Ser Leu Cys Pro Ser 
785 790 795 800 

Asn Ala Ser Leu Leu Lys Pro Asp Ala Leu Thr Met Pro Gly Leu Lys 
805 810 815 

Pro Pro Ala Glu Leu Leu Glu Pro Met Pro Ala Pro Ala Thr He Lys 
820 825 830 

Lys Glu Gin Gly He Thr Pro Met Thr Ser Leu Ala Ser Gly Pro Ala 
835 840 845 

Pro Met Asp Leu Glu Val Pro Thr Lys Gin Ala Gly Glu He Lys Glu 
850 855 860 

Glu Ser Ser Ser Lys Ser Glu Lys Lys Lys Lys Lys Asp Lys His Lys 
865 870 875 880 

His Lys Glu Lys Asp Lys Ser Lys Asp Lys Thr Glu Lys Glu Glu Arg 
885 890 895 



Lys Lys His Lys Arg Asp Lys Gin Lys Asp Arg Ser Gly Ser Gly Gly 
900 905 910 



Ser Lys Asp Ser Ser Leu Pro Asn Glu Pro Leu Lys Met Val lie Lys 
915 920 925 

Asn Pro Asn Gly Ser Leu Gin Ala Gly Ala Ser Ala Pro lie Lys Leu 
930 935 940 

Lys lie Ser Lys Asn Lys Val Glu Pro Asn Asn Tyr Ser Ala Ala Ala 
945 950 955 960 

Gly Leu Pro Gly Ala lie Gly Tyr Gly Leu Pro Pro Thr Thr Ala Thr 
965 970 975 

Thr Thr Ser Ala Ser lie Gly Ala Ala Ala Pro Val Leu Pro Pro Tyr 
980 985 990 

Gly Ala Gly Gly Gly Gly Tyr Ser Ser Ser Gly Gly Ser Ser Ser Gly 
995 1000 1005 

Gly Ser Ser Lys Lys Lys His Ser Asp Arg Asp Arg Asp Lys Glu Ser 
1010 1015 1020 

Lys Lys Asn Lys Ser Gin Asp Tyr Ala Lys Tyr Asn Gly Ala Gly Gly 
1025 1030 1035 1040 

Gly lie Phe Asn Pro Leu Gly Gly Ala Gly Ala Ala Pro Asn Met Ser 
1045 1050 1055 

Gly Gly Met Gly Ala Pro Met Ser Thr Ala Val Pro Pro Ser Met Leu 
1060 1065 1070 

Leu Ala Pro Thr Gly Ala Val Pro Pro Ser Ala Ala Gly Leu Ala Pro 
1075 1080 1085 

Pro Pro Met Pro Val Tyr Asn Lys Lys 
1090 1095 



INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1119 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . . 1116 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 



ATG GCA AAG CAG TAC GAC TCG GTG GAG TGC CCT TTT TGT GAT GAA GTT 48 
Met Ala Lys Gin Tyr Asp Ser Val Glu Cys Pro Phe Cys Asp Glu Val 
15 10 15 

TCC AAA TAC GAG AAG CTC GCC AAG ATC GGC CAA GGC ACC TTC GGG GAG 96 
Ser Lys Tyr Glu Lys Leu Ala Lys lie Gly Gin Gly Thr Phe Gly Glu 
20 25 30 

GTG TTC AAG GCC AGG CAC CGC AAG ACC GGC CAG AAG GTG GCT CTG AAG 144 
Val Phe Lys Ala Arg His Arg Lys Thr Gly Gin Lys Val Ala Leu Lys 
35 40 45 

AAG GTG CTG ATG GAA AAC GAG AAG GAG GGG TTC CCC ATT ACA GCC TTG 192 
Lys Val Leu Met Glu Asn Glu Lys Glu Gly Phe Pro lie Thr Ala Leu 
50 55 60 

CGG GAG ATC AAG ATC CTT CAG CTT CTA AAA CAC GAG AAT GTG GTC AAC 240 
Arg Glu lie Lys lie Leu Gin Leu Leu Lys His Glu Asn Val Val Asn 
65 70 75 80 

TTG ATT GAG ATT TGT CGA ACC AAA GCT TCC CCC TAT AAC CGC TGC AAG 288 
Leu lie Glu lie Cys Arg Thr Lys Ala Ser Pro Tyr Asn Arg Cys Lys 
85 90 95 

GGT AGT ATA TAC CTG GTG TTC GAC TTC TGC GAG CAT GAC CTT GCT GGG 336 
Gly Ser lie Tyr Leu Val Phe Asp Phe Cys Glu His Asp Leu Ala Gly 
100 105 110 

CTG TTG AGC AAT GTT TTG GTC AAG TTC ACG CTG TCT GAG ATC AAG AGG 384 
Leu Leu Ser Asn Val Leu Val Lys Phe Thr Leu Ser Glu He Lys Arg 
115 120 125 

GTG ATG CAG ATG CTG CTT AAC GGC CTC TAC TAC ATC CAC AGA AAC AAG 432 
Val Met Gin Met Leu Leu Asn Gly Leu Tyr Tyr He His Arg Asn Lys 
130 135 140 

ATC CTG CAT AGG GAC ATG AAG GCT GCT AAT GTG CTT ATC ACT CGT GAT 480 
He Leu His Arg Asp Met Lys Ala Ala Asn Val Leu He Thr Arg Asp 
145 150 155 160 

GGG GTC CTG AAG CTG GCA GAC TTT GGG CTG GCC CGG GCC TTC AGC CTG 528 
Gly Val Leu Lys Leu Ala Asp Phe Gly Leu Ala Arg Ala Phe Ser Leu 
165 170 175 

GCC AAG AAC AGC CAG CCC AAC CGC TAC ACC AAC CGT GTG GTG ACA CTC 576 
Ala Lys Asn Ser Gin Pro Asn Arg Tyr Thr Asn Arg Val Val Thr Leu 
180 185 190 

TGG TAC CGG CCC CCG GAG CTG TTG CTC GGG GAG CGG GAC TAC GGC CCC 624 
Trp Tyr Arg Pro Pro Glu Leu Leu Leu Gly Glu Arg Asp Tyr Gly Pro 
195 200 205 

CCC ATT GAC CTG TGG GGT GCT GGG TGC ATC ATG GCA GAG ATG TGG ACC 672 



Pro lie Asp Leu Trp Gly Ala Gly Cys lie Met Ala Glu Met Trp Thr 
210 215 220 



CGC AGC CCC ATC ATG CAG GGC AAC ACG GAG CAG CAC CAA CTC GCC CTC 720 
Arg Ser Pro lie Met Gin Gly Asn Thr Glu Gin His Gin Leu Ala Leu 
225 230 235 240 

ATC AGT CAG CTC TGC GGC TCC ATC ACC CCT GAG GTG TGG CCA AAC GTG 768 
lie Ser Gin Leu Cys Gly Ser lie Thr Pro Glu Val Trp Pro Asn Val 
245 250 255 

GAC AAC TAT GAG CTG TAC GAA AAG CTG GAG CTG GTC AAG GGC CAG AAG 816 
Asp Asn Tyr Glu Leu Tyr Glu Lys Leu Glu Leu Val Lys Gly Gin Lys 
260 265 270 

CGG AAG GTG AAG GAC AGG CTG AAG GCC TAT GTG CGT GAC CCA TAC GCA 864 
Arg Lys Val Lys Asp Arg Leu Lys Ala Tyr Val Arg Asp Pro Tyr Ala 
275 280 285 

CTG GAC CTC ATC GAC AAG CTG CTG GTG CTG GAC CCT GCC CAG CGC ATC 912 
Leu Asp Leu lie Asp Lys Leu Leu Val Leu Asp Pro Ala Gin Arg lie 
290 295 300 

GAC AGC GAT GAC GCC CTC AAC CAC GAC TTC TTC TGG TCC GAC CCC ATG 960 
Asp Ser Asp Asp Ala Leu Asn His Asp Phe Phe Trp Ser Asp Pro Met 
305 310 315 320 

CCC TCC GAC CTC AAG GGC ATG CTC TCC ACC CAC CTG ACG TCC ATG TTC 1008 
Pro Ser Asp Leu Lys Gly Met Leu Ser Thr His Leu Thr Ser Met Phe 
325 330 335 

GAG TAC TTG GCA CCA CCG CGC CGG AAG GGC AGC CAG ATC ACC CAG CAG 1056 
Glu Tyr Leu Ala Pro Pro Arg Arg Lys Gly Ser Gin He Thr Gin Gin 
340 345 350 

TCC ACC AAC CAG AGT CGC AAT CCC GCC ACC ACC AAC CAG ACG GAG TTT 1104 
Ser Thr Asn Gin Ser Arg Asn Pro Ala Thr Thr Asn Gin Thr Glu Phe 
355 360 365 

GAG CGC GTC TTC TGA 1119 
Glu Arg Val Phe 
370 



(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 372 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



Met Ala Lys Gin Tyr Asp Ser Val Glu Cys Pro Phe Cys Asp Glu Val 
15 10 15 



Ser Lys Tyr Glu Lys Leu Ala Lys He Gly Gin Gly Thr Phe Gly Glu 
20 25 30 

Val Phe Lys Ala Arg His Arg Lys Thr Gly Gin Lys Val Ala Leu Lys 
35 40 45 

Lys Val Leu Met Glu Asn Glu Lys Glu Gly Phe Pro He Thr Ala Leu 
50 55 60 

Arg Glu He Lys He Leu Gin Leu Leu Lys His Glu Asn Val Val Asn 
65 70 75 80 

Leu He Glu He Cys Arg Thr Lys Ala Ser Pro Tyr Asn Arg Cys Lys 
85 90 95 

Gly Ser He Tyr Leu Val Phe Asp Phe Cys Glu His Asp Leu Ala Gly 
100 105 110 

Leu Leu Ser Asn Val Leu Val Lys Phe Thr Leu Ser Glu He Lys Arg 
115 120 125 

Val Met Gin Met Leu Leu Asn Gly Leu Tyr Tyr He His Arg Asn Lys 
130 135 140 

He Leu His Arg Asp Met Lys Ala Ala Asn Val Leu He Thr Arg Asp 
145 150 155 160 

Gly Val Leu Lys Leu Ala Asp Phe Gly Leu Ala Arg Ala Phe Ser Leu 
165 170 175 

Ala Lys Asn Ser Gin Pro Asn Arg Tyr Thr Asn Arg Val Val Thr Leu 
180 185 190 

Trp Tyr Arg Pro Pro Glu Leu Leu Leu Gly Glu Arg Asp Tyr Gly Pro 
195 200 205 

Pro He Asp Leu Trp Gly Ala Gly Cys He Met Ala Glu Met Trp Thr 
210 215 220 

Arg Ser Pro He Met Gin Gly Asn Thr Glu Gin His Gin Leu Ala Leu 
225 230 235 240 

He Ser Gin Leu Cys Gly Ser He Thr Pro Glu Val Trp Pro Asn Val 
245 250 255 

Asp Asn Tyr Glu Leu Tyr Glu Lys Leu Glu Leu Val Lys Gly Gin Lys 
260 265 270 



Arg Lys Val Lys Asp Arg Leu Lys Ala Tyr Val Arg Asp Pro Tyr Ala 
275 280 285 



Leu Asp Leu lie 
290 

Asp Ser Asp Asp 
305 

Pro Ser Asp Leu 



Glu Tyr Leu Ala 
340 

Ser Thr Asn Gin 
355 

Glu Arg Val Phe 
370 



Asp Lys Leu Leu 
295 

Ala Leu Asn His 
310 

Lys Gly Met Leu 
325 

Pro Pro Arg Arg 



Ser Arg Asn Pro 
360 



Val Leu Asp Pro 
300 

Asp Phe Phe Trp 
315 

Ser Thr His Leu 
330 

Lys Gly Ser Gin 
345 

Ala Thr Thr Asn 



Ala Gin Arg lie 



Ser Asp Pro Met 
320 

Thr Ser Met Phe 
335 

He Thr Gin Gin 
350 

Gin Thr Glu Phe 
365 



(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ACGAATTCCA CACAATCCAA AGATC 25 



(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CAGAATTCCT ATTGCCGATC CCCAGA 26 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME /KEY: modif ied_base 

(B) LOCATION: one-of(8, 14) 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "N = A or C or G or T" 

(ix) FEATURE: 

(A) NAME /KEY : modif ied_base 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /mod_base=: OTHER 
/note= "Y = C or T" 

(ix) FEATURE : 

(A) NAME/KEY: modif iedjsase 

(B) LOCATION: one-of(17, 20) 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "R = A or G ,f 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

GGAATTCNAT GYTNCARCAR CC 22 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : modif iedjoase 

(B) LOCATION: one-of (13, 16, 19, 22, 25) 
(D) OTHER INFORMATION: /mod_base= OTHER 

/note= "R = A or G M 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
AACTGCAGTC CARAARAART CRTGRTT 27 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



TGTCAAGGAT CAAACCGGCT GTGAT 



25 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
CGAATTCCAA GAAACGCATC GATGC 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
AGACCTGCCA AATCGTGT 



(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 14 
AGAAGGTGGA TCTGTAACCA TTCGT 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
"(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 

GGAATTCAGA TCTCGATCAG ATTCA 



(2) INFORMATION FOR SEQ ID NO: 16: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
TTACTACTCG AGCTACCAAA CCCGGTC 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 

TAAGCAAGCT TCTATGGCGC ACATGTCC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
TTACTACTCG AGCTACCAAA CCCGGTC 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: one-of(13, 16, 22) 
(D) OTHER INFORMATION: /mod_base; 

/note= "Y = C or T" 



(ix) FEATURE: 

(A) NAME /KEY : modif ied_base 



/note= 



(B) LOCATION: 17 

(D) OTHER INFORMATION: /modjoase= OTHER 
"W = A or T" 



(ix) FEATURE : 

(A) NAME /KEY: modif iedjoase 

(B) LOCATION: 18 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "S = C or G" 

(ix) FEATURE: 

(A) NAME /KEY : modif iedjoase 

(B) LOCATION: 19 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "N = A or C or G or T" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GGAATTCTGG TAYTTYWSNA AYGA 24 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 11 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "Y = C or T" 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 14 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "R = A or G" 

(ix) FEATURE: 

(A) NAME /KEY : modif ied_base 

(B) LOCATION: one-of(17, 20) 

(D) OTHER INFORMATION: /mod_base- OTHER 
/note= "N = A or C or G or T" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CGGGATCCTG YTCRAANGGN GGCAT 25 



(2) INFORMATION FOR SEQ ID NO: 21: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : modif ied_base 

(B) LOCATION: one-of (11, 14, 20) 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= M N = A or C or G or T M 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 23 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note* "R = A or G" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

CGGGATCCAA NGGNGGCATN CCRT 24 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
ATCACGACAC CACCAGAGCT GTTA 24 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
CGAATTCAGA TCGTGAACGG GA 22 



(2) INFORMATION FOR SEQ ID NO: 24: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CGAATTCAGG CGCTAGCAAT G 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
GAAAGGCGTA GAACCGA 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
GCTGACCCAT TTCCTGTATC AGATAG 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GGAATTCTTC TGCTTGGCGA AT 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
GGGAATTCGA GGTTCTATAC ATAT 24 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CTGTGTGAAT GGAATCTGTG ATGTG 25 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

TATCCCGGGT CATATGAGTC TCCTAGCC 2 8 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Met Leu Gin Gin Pro Ser Gly Ser Thr Pro Ser Asn Val 
15 10 



(2) INFORMATION FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

Ala Asp Thr Ala Leu Asn His Asp Phe Phe Trp Thr Asp Pro Met Pro 
15 10 15 

Ser 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

Met Leu Gin Gin Pro 
1 5 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Asn His Asp Phe Phe Trp Thr 
1 5 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Ser Pro Glu Trp Pro Asp lie 
1 5 



(2) INFORMATION FOR SEQ ID NO: 36: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

Trp Tyr Phe Ser Asn Asp Gin Leu Ala Asn Ser Pro Ser Arg 
15 10 



INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

Thr Val His Gly Met Pro Pro Phe Glu Gin Gin Leu Pro Tyr 
15 10 



INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

Trp Tyr Phe Ser Asn Asp 
1 5 



INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Met Pro Pro Phe Glu Gin 
1 5 



INFORMATION FOR SEQ ID NO: 40: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

His Gly Met Pro Pro Phe 
1 5 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 
GCAGGATCCA GAATTCCATA TGGCAAAGCA GTACGACTCG G 41 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
CAGTACTCGA GTTATCAGAA GACGCGCTCA AAC 33 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4528 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
GGGGGGGGGG GGGTGAATGA AGGAGCGGGC GGAGGAGGAA TTGTCATGGC GTCGGGCCGT 60 
GGAGCTTCTT CTCGCTGGTT CTTTACTCGG GAACAGCTGG AGAACACGCC GAGCCGCCGC 120 



TGCGGAGTGG AGGCGGATAA AGAGCTCTCG TGCCGCCAGC AGGCGGCCAA CCTCATCCAG 180 



GAGATGGGAC AGCGTCTCAA TGTCTCTCAG CTTACAATAA ACACTGCGAT TGTTTATATG 240 

CACAGGTTTT ATATGCACCA TTCTTTCACC AAATTCAACA AAAATATAAT ATCGTCTACT 300 

GCATTATTTT TGGCTGCAAA AGTGGAAGAA CAGGCTCGAA AACTTGAACA TGTTATCAAA 360 

GTAGCACATG CTTGTCTTCA TCCTCTAGAG CCACTGCTGG ATACTAAATG TGATGCTTAC 420 

CTTCAACAGA CTCAAGAACT GGTTATACTT GAAACCATAA TGCTACAAAC TCTAGGTTTT 480 

GAGATCACCA TTGAACACCC ACACACAGAT GTGGTGAAAT GTACCCAGTT AGTAAGAGCA 540 

AGCAAGGATT TGGCACAGAC ATCCTATTTC ATGGCTACCA ACAGTCTGCA TCTTACAACC 600 

TTCTGTCTTC AGTACAAACC AACAGTGATA GCATGTGTAT GCATTCATTT GGCTTGCAAA 66 0 

TGGTCCAATT GGGAGATCCC TGTATCAACT GATGGAAAGC ATTGGTGGGA ATATGTGGAT 72 0 

CCTACAGTTA CTCTAGAATT ATTAGATGAG CTAACACATG AGTTTCTACA AATATTGGAG 780 

AAAACGCCTA ATAGGTTGAA GAAGATTCGA AACTGGAGGG CTAATCAGGC AGCTAGGAAA 840 

CCAAAAGTAG ATGGACAGGT ATCAGAGACA CCACTTCTTG GTTCATCTTT GGTCCAGAAT 900 

TCCATTTTAG TAGATAGTGT CACTGGTGTG CCTACAAACC CAAGTTTTCA GAAAC CATCT 960 

ACATCAGCAT TCCCTGCGCC AGTACCTCTA AATTCAGGAA ATATTTCTGT TCAAGACAGC 1020 

CATACATCTG ATAATTTGTC AATGCTAGCA ACAGGAATGC CAAGTACTTC ATACGGTTTA 1080 

TCATCACACC AGGAATGGCC TCAACATCAA GACTCAGCAA GGACAGAACA GCTATATTCA 1140 

CAGAAACAGG AGACATCTTT GTCTGGTAGC CAGTACAACA TCAACTTCCA GCAGGGACCT 1200 

TCTATATCAC TGCATTCAGG ATTACATCAC AGACCTGACA AAATTTCAGA TCATTCTTCT 12 60 

GTTAAGCAAG AATATACTCA TAAAGCAGGG AGCAGTAAAC ACCATGGGCC AATTTCCACT 1320 

ACTCCAGGAA TAATTCCTCA GAAAATGTCT TTAGATAAAT ATAGAGAAAA GCGTAAACTA 1380 

GAAACTCTTG ATCTCGATGT AAGGGATCAT TATATAGCTG CCCAGGTAGA ACAGCAGCAC 1440 

AAACAAGGGC AGTCACAGGC AGCCAGCAGC AGTTCTGTTA CTTCTCCCAT TAAAATGAAA 1500 

ATACCTATCG CAAATACTGA AAAATACATG GCAGATAAAA AGGAAAAGAG TGGGTCACTG 1560 

AAATTACGGA TTCCAATACC ACCCACTGAT AAAAGCGCCA GTAAAGAAGA ACTGAAAATG 1620 

AAAATAAAAG TTTCTTCTTC AGAAAGACAC AGCTCTTCTG ATGAAGGCAG TGGGAAAAGC 1680 

AAACATTCAA GCCCACATAT TAGCAGAGAC CATAAGGAGA AGCACAAGGA GCATCCTTCA 1740 

AGCCGCCACC ACACCAGCAG CCACAAGCAT TCCCACTCGC ATAGTGGCAG CAGCAGCGGT 1800 



GGCAGTAAAC ACAGTGCCGA CGGAATACCA CCCACTGTTC TGAGGAGTCC TGTTGGCCTG 1860 

AGCAGTGATG GCATTTCCTC TAGCTCCAGC TCTTCAAGGA AGAGGCTGCA TGTCAATGAT 1920 

GCATCTCACA ACCACCACTC CAAAATGAGC AAAAGTTCCA AAAGTTCAGG TGGGCTACGG 1980 

ACATCTCAGC ACCTCGTGAA ACTGGACAAG AAGCCAGTGG AGACCAACGG TCCTGATGCC 2040 

AATCACGAGT ACAGTACAAG CAGCCAGCAT ATGGACTACA AAGACACATT CGACATGCTG 2100 

GACTCACTGT TAAGTGCCCA AGGAATGAAC ATGTAATAAT TTGTTTAGGT CAATTTTTCC 2160 

TTTACTTTTT TAATTTAAAA ATTGTTAGAA TGGAAAAATT CCTTCTGATC TAGCAGTGGT 222 0 

AACCCCTGCT GTTGCTGCCA CTGCTTCAAT ATTTGTAAGT GCTACTTTAT TCTTCATTCT 2280 

GAAAAGAAGA GATTATAGTA AACAAGTCTT TATCTCCACA TATGATAGTG TTATAAATAC 2340 

TGTAAAGGCA TGGAAGGTGC AAAACTCAGT ATTTCTACAA TTGCAGCTAA GAACATTAGG 2400 

ATGAATGGCT GGCTGCTTCT AGGAATATAA GATGCCTCAA GCATTCATTA TTTATGATTT 2460 

GAATACTGTA GCTATTTTTT GTTGCTTGGC TTTTGAATGA GTGTAAATTG TTTTCTTTTG 2520 

TGTATTTATA CTTGTATGTA TGATTTGCAT GTTTCAATGA TAAAGGGATA AAACAGTATA 2 580 

CTGACAACTG TTTACAAGAA AGTGGAGAAA ATGTACTACA TTTTGTATGT TTAGATATTA 2640 

CCGTAAATAC TCAGGATTGG AGCTGCTTGT AAGTATAACA ATATACAGAA TACTTTATTT 2700 

TATCTTGTCA GAGTTCCATC ACTATCTAAA ACAAAGGTGC AATTTTTTAT GTTAACCTTA 2760 

AATCTAGCCC TTACTGGAAG CCACTGATAG GGACATTCAC TACCAGATGT GTGCAGTGCA 2820 

GCAGATGGTC ATATAACACT GTGAGGCACT GAATTTTGCC TTCAGAGGTT CTGACCAGAT 2880 

TGGCTGCTGA AATAGCCCCT AACTTTCTGA AGGCTTGAAG AGGAAAAAAT AAAGTTTACA 2940 

TACTCTTGAT GGAAGTGCAT TTAAATGTTT GTTGGCTTGT TGCAGTTCTA TGAAACAGAG 3000 

CTGTTAATAA TGGTTATGTG GATTACTGTG ATTTGAAAAC TAAATTCACA ATAACTTACC 3060 

TAGTAGAGAT TTAGTGAGTT GTTTCCTTTA AAGAATTTTA CACTACATAT TTTAATAGTA 3120 

AACAGGGTCA CTTTCCTTTA GCATTCAGAA TGACACCATA TTCTTAAATA TACTCCTTCC 3180 

CTGAAGCGTG TTTGTGTGTG ATGCCATATT TCTTTTTCAG GTAAATGTAG TCTTCCTTAT 3240 

AAAAATGAAA TTAAAC CTAT GCTCTCAATT CTTTTATATT CTAACAATAA ATAAAAAAGA 3300 

AAAGATTACT GACTGTGCAT TGTACCTGTA TTTATAGTTT ATGGTTATCA GAAGCTCTGT 3360 

AAGAAAGAAA AGGTCAGCTC CCAGGCAAAC CAGTAGTGGA GGTTTTACAT TTGTTTGCAC 3420 



ATCTCAGTAT ATTTCTGTTG AGGTAAAGTT TGCACAGTCA TCTGACTTCT GATCAAGCAT 3480 

TAGATTTTAA CTTGTTTAGA TTTTGTCTTA AACACCAGTA ATATGGCTCT TGTTTATCAG 3 540 

CTAATCTTGA ATTTATTCTG TGGTAAATCT TTTGAGTTGC TGAGTATATT TGAGATTGAT 3600 

TGGATTCAAC CTCTTGTTGA ACTGAAAACT TAATTTTTTC TCTGTATTTT TGTTACAAAG 3660 

CCACTGATAC GTGCACAATT GTAATTAAGT ATGTTGCAGT TGTAAATATT AGAGTTTAAT 372 0 

CTCATGCTCT ACCTTTATTT AGCAATTACC TAATTTGCCA GTAGCTTTAT AATTTTTAAA 3780 

GATAATTGTT CATTATTTTG TCAATGTTAT TTGAACTTGG GGTACTTAGG AGCCTCTTTG 3840 

TAGGGACTGT GCCTAGGTAG CATGTCCTAA CATTTGTTCT GGTCTTGCAT AACTTCAGTA 3900 

TCTTTGTCAT TATATGTAAC TTTGTTGCTC TGTATGGCAT AATATTGTAT CCATAAACAT 3960 

GGTAATTTTG ATACAGTTAT ACTTTTACAG TGGTACATAA TCCAAGGACT AGTATAGAAT 4020 

TAAGCTGAGT GCAAGATGAG GGAGGGAAGG GCTTTCTTGG TAATTTAGAT GTGAAACCTC 4080 

TACAGAGCTA TCATGTAAAA ACTACATGAG GTGGTTGTGC TACTGTATAA TTGGGGGTGA 4140 

TAATACCAGG AATTTTAATA AGATTTTGTA AAGAATATCC AGAAAAGTAG TGAACTTATT 42 00 

TTCAGTAGGC ATAGAAAACA ATGTGAATAT TTAAGGTCTG TGACTATAGT TAAACTTCAC 4260 

TAAGAATTTG CAGAATTGTT TTGAGATGTG TGAATAAAGG TAATTTTATT GAATCTTCAT 4320 

TGGTGCTAAT GTTGGACAGT TAAAAAGATA GCTAGTGTAT ATTGTTATGG GTCAGTACTT 4380 

ATTAGTACTT CCAAAATTGA ATTTGAAATG CTATGTATTC ACTTTTCACT CTGTAAATGT 4440 

AATTCTTTAC AATGACTTTA TTTATTAAAG GGCAGCCAGT TGTCATTTGT AAAAAAAAAA 4500 

AAAAAAAAAA AAAGCGGCCG CTGAATTC 4528 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2091 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

ATGGCGTCGG GCCGTGGAGC TTCTTCTCGC TGGTTCTTTA CTCGGGAACA GCTGGAGAAC 60 

ACGCCGAGCC GCCGCTGCGG AGTGGAGGCG GATAAAGAGC TCTCGTGCCG CCAGCAGGCG 120 



GCCAACCTCA TCCAGGAGAT GGGACAGCGT CTCAATGTCT CTCAGCTTAC AATAAACACT 180 

GCGATTGTTT ATATGCACAG GTTTTATATG CACCATTCTT TCACCAAATT CAACAAAAAT 240 

ATAATATCGT CTACTGCATT ATTTTTGGCT GCAAAAGTGG AAGAACAGGC TCGAAAACTT 3 00 

GAACATGTTA TCAAAGTAGC ACATGCTTGT CTTCATCCTC TAGAGCCACT GCTGGATACT 36 0 

AAATGTGATG CTTACCTTCA ACAGACTCAA GAACTGGTTA TACTTGAAAC CATAATGCTA 420 

CAAACTCTAG GTTTTGAGAT CACCATTGAA CACCCACACA CAGATGTGGT GAAATGTACC 480 

CAGTTAGTAA GAGCAAGCAA GGATTTGGCA CAGACATCCT ATTTCATGGC TACCAACAGT 540 

CTGCATCTTA CAACCTTCTG TCTTCAGTAC AAACCAACAG TGATAGCATG TGTATGCATT 600 

CATTTGGCTT GCAAATGGTC CAATTGGGAG ATCCCTGTAT CAACTGATGG AAAGCATTGG 660 

TGGGAATATG TGGATCCTAC AGTTACTCTA GAATTATTAG ATGAGCTAAC ACATGAGTTT 720 

CTACAAATAT TGGAGAAAAC GCCTAATAGG TTGAAGAAGA TTCGAAACTG GAGGGCTAAT 780 

CAGGCAGCTA GG AAAC C AAA AGTAGATGGA CAGGTATCAG AGACACCACT TCTTGGTTCA 840 

TCTTTGGTCC AGAATTCCAT TTTAGTAGAT AGTGTCACTG GTGTGCCTAC AAACCCAAGT 900 

TTTCAGAAAC CATCTACATC AGCATTCCCT GCGCCAGTAC CTCTAAATTC AGGAAATATT 960 

TCTGTTCAAG ACAGCCATAC ATCTGATAAT TTGTCAATGC TAGCAACAGG AATGCCAAGT 1020 

ACTTCATACG GTTTATCATC ACACCAGGAA TGGCCTCAAC ATCAAGACTC AGCAAGGACA 1080 

GAACAGCTAT ATTCACAGAA ACAGGAGACA TCTTTGTCTG GTAGCCAGTA CAACATCAAC 114 0 

TTCCAGCAGG GACCTTCTAT ATCACTGCAT TCAGGATTAC ATCACAGACC TGACAAAATT 1200 

TCAGATCATT CTTCTGTTAA GCAAGAATAT ACTCATAAAG CAGGGAGCAG TAAACACCAT 1260 

GGGCCAATTT CCACTACTCC AGGAATAATT CCTCAGAAAA TGTCTTTAGA TAAATATAGA 1320 

GAAAAGCGTA AACTAGAAAC TCTTGATCTC GATGTAAGGG ATCATTATAT AGCTGCCCAG 1380 

GTAGAACAGC AGCACAAACA AGGGCAGTCA CAGGCAGCCA GCAGCAGTTC TGTTACTTCT 1440 

CCCATTAAAA TGAAAAT AC C TATCGCAAAT ACTGAAAAAT ACATGGCAGA TAAAAAGGAA 1500 

AAGAGTGGGT CACTGAAATT ACGGATTCCA ATACCACCCA CTGATAAAAG CGCCAGTAAA 1560 

GAAGAACTGA AAATGAAAAT AAAAGTTTCT TCTTCAGAAA GACACAGCTC TTCTGATGAA 162 0 

GGCAGTGGGA AAAGCAAACA TTCAAGCCCA CATATTAGCA GAGACCATAA GGAGAAGCAC 1680 

AAGGAGCATC CTTCAAGCCG CCACCACACC AGCAGCCACA AGCATTCCCA CTCGCATAGT 1740 



GGCAGCAGCA GCGGTGGCAG TAAACACAGT GC CGACGGAA TACCACCCAC TGTTCTGAGG 18 00 

AGTCCTGTTG GCCTGAGCAG TGATGGCATT TCCTCTAGCT CCAGCTCTTC AAGGAAGAGG I860 

CTGCATGTCA ATGATGCATC TCACAACCAC CACTCCAAAA TGAGCAAAAG TTCCAAAAGT 192 0 

TCAGGTGGGC TACGGACATC TCAGCACCTC GTGAAACTGG ACAAGAAGCC AGTGGAGACC 1980 

AACGGTCCTG ATGCCAATCA CGAGTACAGT ACAAGCAGCC AGCATATGGA CTACAAAGAC 2040 

ACATTCGACA TGCTGGACTC ACTGTTAAGT GCCCAAGGAA TGAACATGTA A 2091 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 696 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 

Met Ala Ser Gly Arg Gly Ala Ser Ser Arg Trp Phe Phe Thr Arg Glu 
15 10 15 

Gin Leu Glu Asn Thr Pro Ser Arg Arg Cys Gly Val Glu Ala Asp Lys 
20 25 30 

Glu Leu Ser Cys Arg Gin Gin Ala Ala Asn Leu He Gin Glu Met Gly 
35 40 45 

Gin Arg Leu Asn Val Ser Gin Leu Thr He Asn Thr Ala He Val Tyr 
50 55 60 

Met His Arg Phe Tyr Met His His Ser Phe Thr Lys Phe Asn Lys Asn 
65 70 75 80 

He He Ser Ser Thr Ala Leu Phe Leu Ala Ala Lys Val Glu Glu Gin 
85 90 95 

Ala Arg Lys Leu Glu His Val He Lys Val Ala His Ala Cys Leu His 
100 105 110 

Pro Leu Glu Pro Leu Leu Asp Thr Lys Cys Asp Ala Tyr Leu Gin Gin 
115 120 125 

Thr Gin Glu Leu Val He Leu Glu Thr He Met Leu Gin Thr Leu Gly 
130 135 140 

Phe Glu He Thr He Glu His Pro His Thr Asp Val Val Lys Cys Thr 
145 150 155 160 



Gin Leu Val Arg Ala Ser Lys Asp Leu Ala Gin Thr Ser Tyr Phe Met 



165 170 175 

Ala Thr Asn Ser Leu His Leu Thr Thr Phe Cys Leu Gin Tyr Lys Pro 
180 185 190 

Thr Val lie Ala Cys Val Cys lie His Leu Ala Cys Lys Trp Ser Asn 
195 200 205 

Trp Glu He Pro Val Ser Thr Asp Gly Lys His Trp Trp Glu Tyr Val 
210 215 220 

Asp Pro Thr Val Thr Leu Glu Leu Leu Asp Glu Leu Thr His Glu Phe 
225 230 235 240 

Leu Gin He Leu Glu Lys Thr Pro Asn Arg Leu Lys Lys He Arg Asn 
245 250 255 

Trp Arg Ala Asn Gin Ala Ala Arg Lys Pro Lys Val Asp Gly Gin Val 
260 265 270 

Ser Glu Thr Pro Leu Leu Gly Ser Ser Leu Val Gin Asn Ser He Leu 
275 280 285 

Val Asp Ser Val Thr Gly Val Pro Thr Asn Pro Ser Phe Gin Lys Pro 
290 295 300 

Ser Thr Ser Ala Phe Pro Ala Pro Val Pro Leu Asn Ser Gly Asn He 
305 310 315 320 

Ser Val Gin Asp Ser His Thr Ser Asp Asn Leu Ser Met Leu Ala Thr 
325 330 335 

Gly Met Pro Ser Thr Ser Tyr Gly Leu Ser Ser His Gin Glu Trp Pro 
340 345 350 

Gin His Gin Asp Ser Ala Arg Thr Glu Gin Leu Tyr Ser Gin Lys Gin 
355 360 365 

Glu Thr Ser Leu Ser Gly Ser Gin Tyr Asn He Asn Phe Gin Gin Gly 
370 375 380 

Pro Ser He Ser Leu His Ser Gly Leu His His Arg Pro Asp Lys He 
385 390 395 400 

Ser Asp His Ser Ser Val Lys Gin Glu Tyr Thr His Lys Ala Gly Ser 
405 410 415 

Ser Lys His His Gly Pro He Ser Thr Thr Pro Gly He He Pro Gin 
420 425 430 

Lys Met Ser Leu Asp Lys Tyr Arg Glu Lys Arg Lys Leu Glu Thr Leu 
435 440 445 



Asp Leu Asp Val Arg Asp His Tyr He Ala Ala Gin Val Glu Gin Gin 



450 455 460 

His Lys Gin Gly Gin Ser Gin Ala Ala Ser Ser Ser Ser Val Thr Ser 
465 470 475 480 

Pro lie Lys Met Lys lie Pro lie Ala Asn Thr Glu Lys Tyr Met Ala 
485 490 495 

Asp Lys Lys Glu Lys Ser Gly Ser Leu Lys Leu Arg lie Pro lie Pro 
500 505 510 

Pro Thr Asp Lys Ser Ala Ser Lys Glu Glu Leu Lys Met Lys He Lys 
515 520 525 

Val Ser Ser Ser Glu Arg His Ser Ser Ser Asp Glu Gly Ser Gly Lys 
530 535 540 

Ser Lys His Ser Ser Pro His He Ser Arg Asp His Lys Glu Lys His 
545 550 555 560 

Lys Glu His Pro Ser Ser Arg His His Thr Ser Ser His Lys His Ser 
565 570 575 

His Ser His Ser Gly Ser Ser Ser Gly Gly Ser Lys His Ser Ala Asp 
580 585 590 

Gly He Pro Pro Thr Val Leu Arg Ser Pro Val Gly Leu Ser Ser Asp 
595 600 605 

Gly He Ser Ser Ser Ser Ser Ser Ser Arg Lys Arg Leu His Val Asn 
610 615 620 

Asp Ala Ser His Asn His His Ser Lys Met Ser Lys Ser Ser Lys Ser 
625 630 635 640 

Ser Gly Gly Leu Arg Thr Ser Gin His Leu Val Lys Leu Asp Lys Lys 
645 650 655 

Pro Val Glu Thr Asn Gly Pro Asp Ala Asn His Glu Tyr Ser Thr Ser 
660 665 670 

Ser Gin His Met Asp Tyr Lys Asp Thr Phe Asp Met Leu Asp Ser Leu 
675 680 685 

Leu Ser Ala Gin Gly Met Asn Met 
690 695 



INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2190 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 



(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

ATGGCGTCGG GCCGTGGAGC TTCTTCTCGC TGGTTCTTTA CTCGGGAACA GCTGGAGAAC 60 

ACGCCGAGCC GCCGCTGCGG AGTGGAGGCG GATAAAGAGC TCTCGTGCCG CCAGCAGGCG 120 

GCCAACCTCA TCCAGGAGAT GGGACAGCGT CTCAATGTCT CTCAGCTTAC AATAAACACT 180 

GCGATTGTTT ATATGCACAG GTTTTATATG CACCATTCTT TCACCAAATT CAACAAAAAT 240 

ATAATATCGT CTACTGCATT ATTTTTGGCT GCAAAAGTGG AAGAACAGGC TCGAAAACTT 3 00 

GAACATGTTA TCAAAGTAGC ACATGCTTGT CTTCATCCTC TAGAGCCACT GCTGGATACT 360 

AAATGTGATG CTTACCTTCA ACAGACTCAA GAACTGGTTA TACTTGAAAC CATAATGCTA 420 

CAAACTCTAG GTTTTGAGAT CACCATTGAA CACCCACACA CAGATGTGGT GAAATGTACC 48 0 

CAGTTAGTAA GAGCAAGCAA GGATTTGGCA CAGACATCCT ATTTCATGGC TACCAACAGT 540 

CTGCATCTTA CAACCTTCTG TCTTCAGTAC AAACCAACAG TGATAGCATG TGTATGCATT 6 00 

CATTTGGCTT GCAAATGGTG CAATTGGGAG ATCCCTGTAT CAACTGATGG AAAGCATTGG 660 

TGGGAATATG TGGATCCTAC AGTTACTCTA GAATTATTAG ATGAGCTAAC ACATGAGTTT 720 

CTACAAATAT TGGAGAAAAC GCCTAATAGG TTGAAGAAGA TTCGAAACTG GAGGGCTAAT 780 

CAGGCAGCTA GGAAACCAAA AGTAGATGGA CAGGTATCAG AGACACCACT TCTTGGTTCA 840 

TCTTTGGTCC AGAATTCCAT TTTAGTAGAT AGTGTCACTG GTGTGCCTAC AAACCCAAGT 900 

TTTCAGAAAC CATCTACATC AGCATTCCCT GCGCCAGTAC CTCTAAATTC AGGAAATATT 960 

TCTGTTCAAG ACAGCCATAC ATCTGATAAT TTGTCAATGC TAGCAACAGG AATGCCAAGT 1020 

ACTTCATACG GTTTATCATC ACACCAGGAA TGGCCTCAAC ATCAAGACTC AGCAAGGACA 1080 

GAACAGCTAT ATTCACAGAA ACAGGAGACA TCTTTGTCTG GTAGCCAGTA CAACATCAAC 1140 

TTCCAGCAGG GACCTTCTAT ATCACTGCAT TCAGGATTAC ATCACAGACC TGACAAAATT 1200 

TCAGATCATT CTTCTGTTAA GCAGGAATAT ACTCATAAAG CAGGGAGCAG TAAACACCAT 1260 

GGGCCAATTT CCACTACTCC AGGAATAATT CCTCAGAAAA TGTCTTTAGA TAAATATAGA 1320 

GAAAAGCGTA AACTAGAAAC TCTTGATCTC GATGTAAGGG ATCATTATAT AGCTGCCCAG 1380 

GTAGAACAGC AGCACAAACA AGGGCAGTCA CAGGCAGCCA GCAGCAGTTC TGTTACTTCT 1440 

CCCATTAAAA TGAAAATACC TATCGCAAAT ACTGAAAAAT ACATGGCAGA TAAAAAGGAA 1500 



AAGAGTGGGT 


CACTGAAATT 


ACGGATTCCA 


ATACCACCCA 


CTGATAAAAG 


CGCCAGTAAA 


1560 


GAAGAACTGA 


AAATGAAAAT 


AAAAGTTTCT 


TCTTCAGAAA 


GACACAGCTC 


TTCTGATGAA 


1620 


GGCAGTGGGA 


AAAGCAAACA 


TTCAAGCCCA 


CATATTAGCA 


GAGACCATAA 


GGAGAAGCAC 


1680 


AAGGAGCATC 


CTTCAAGCCG 


CCACCACACC 


AGCAGCCACA 


AGCATTCCCA 


CTCGCATAGT 


1740 


GGCAGCAGCA 


GCGGTGGCAG 


TAAACACAGT 


GCCGACGGAA 


TACCACCCAC 


TGTTCTGAGG 


1800 


AGTCCTGTTG 


GCCTGAGCAG 


TGATGGCATT 


TCCTCTAGCT 


CCAGCTCTTC 


AAGGAAGAGG 


1860 


CTGCATGTCA 


ATGATGCATC 


TCACAACCAC 


CACTCCAAAA 


TGAGCAAAAG 


TTCCAAAAGT 


1920 


TCAGGTAGTT 


CATCTAGTTC 


TTCCTCCTCT 


GTTAAGCAGT 


ATATATCCTC 


TCACAACTCT 


1980 


GTTTTTAACC 


ATCCCTTACC 


CCTCCTCCCC 


TGTCACATAC 


CAGGTGGGCT 


ACGGACATCT 


2040 


CTGCACCTCG 


TGAAACTGGA 


CAAGAAGCGA 


GTGGAGAC C A 


ACGGTCCTGA 


TGCCAATCAC 


2100 


GAGTACAGTA 


CAAGCAGCCA 


GCATATGGAC 


TACAAAGACA 


CATTCGACAT 


GCTGGACTCA 


2160 


CTGTTAAGTG 


CCCAAGGAAT 


GAACATGTAA 








2190 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 729 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

Met Ala Ser Gly Arg Gly Ala Ser Ser Arg Trp Phe Phe Thr Arg Glu 
15 10 15 

Gin Leu Glu Asn Thr Pro Ser Arg Arg Cys Gly Val Glu Ala Asp Lys 
20 25 30 

Glu Leu Ser Cys Arg Gin Gin Ala Ala Asn Leu He Gin Glu Met Gly 
35 40 45 

Gin Arg Leu Asn Val Ser Gin Leu Thr He Asn Thr Ala He Val Tyr 
50 55 60 

Met His Arg Phe Tyr Met His His Ser Phe Thr Lys Phe Asn Lys Asn 
65 70 75 80 

He He Ser Ser Thr Ala Leu Phe Leu Ala Ala Lys Val Glu Glu Gin 
85 90 95 

Ala Arg Lys Leu Glu His Val He Lys Val Ala His Ala Cys Leu His 



100 



105 



110 



Pro Leu Glu Pro 
115 

Thr Gin Glu Leu 
130 

Phe Glu He Thr 
145 

Gin Leu Val Arg 



Ala Thr Asn Ser 
180 



Leu Leu Asp Thr 
120 

Val He Leu Glu 
135 

He Glu His Pro 
150 

Ala Ser Lys Asp 
165 

Leu His Leu Thr 



Lys Cys Asp Ala 



Thr He Met Leu 
140 

His Thr Asp Val 
155 

Leu Ala Gin Thr 
170 

Thr Phe Cys Leu 
185 



Tyr Leu Gin Gin 
125 

Gin Thr Leu Gly 



Val Lys Cys Thr 
160 

Ser Tyr Phe Met 
175 

Gin Tyr Lys Pro 
190 



Thr Val He Ala Cys Val Cys He His Leu Ala Cys Lys Trp Ser Asn 
195 200 205 

Trp Glu He Pro Val Ser Thr Asp Gly Lys His Trp Trp Glu Tyr Val 
210 215 220 

Asp Pro Thr Val Thr Leu Glu Leu Leu Asp Glu Leu Thr His Glu Phe 
225 230 235 240 

Leu Gin He Leu Glu Lys Thr Pro Asn Arg Leu Lys Lys He Arg Asn 
245 250 255 

Trp Arg Ala Asn Gin Ala Ala Arg Lys Pro Lys Val Asp Gly Gin Val 
260 265 270 

Ser Glu Thr Pro Leu Leu Gly Ser Ser Leu Val Gin Asn Ser He Leu 
275 280 285 

Val Asp Ser Val Thr Gly Val Pro Thr Asn Pro Ser Phe Gin Lys Pro 
290 295 300 

Ser Thr Ser Ala Phe Pro Ala Pro Val Pro Leu Asn Ser Gly Asn He 
305 310 315 320 

Ser Val Gin Asp Ser His Thr Ser Asp Asn Leu Ser Met Leu Ala Thr 
325 330 335 

Gly Met Pro Ser Thr Ser Tyr Gly Leu Ser Ser His Gin Glu Trp Pro 
340 345 350 

Gin His Gin Asp Ser Ala Arg Thr Glu Gin Leu Tyr Ser Gin Lys Gin 
355 360 365 

Glu Thr Ser Leu Ser Gly Ser Gin Tyr Asn He Asn Phe Gin Gin Gly 
370 375 380 



Pro Ser He Ser Leu His Ser Gly Leu His His Arg Pro Asp Lys He 



385 



390 



395 



400 



Ser Asp His Ser Ser Val Lys Gin Glu Tyr Thr His Lys Ala Gly Ser 
405 410 415 

Ser Lys His His Gly Pro lie Ser Thr Thr Pro Gly lie lie Pro Gin 
420 425 430 

Lys Met Ser Leu Asp Lys Tyr Arg Glu Lys Arg Lys Leu Glu Thr Leu 
435 440 445 

Asp Leu Asp Val Arg Asp His Tyr lie Ala Ala Gin Val Glu Gin Gin 
450 455 460 

His Lys Gin Gly Gin Ser Gin Ala Ala Ser Ser Ser Ser Val Thr Ser 
465 470 475 480 

Pro lie Lys Met Lys lie Pro lie Ala Asn Thr Glu Lys Tyr Met Ala 
485 490 495 

Asp Lys Lys Glu Lys Ser Gly Ser Leu Lys Leu Arg lie Pro lie Pro 
500 505 510 

Pro Thr Asp Lys Ser Ala Ser Lys Glu Glu Leu Lys Met Lys lie Lys 
515 520 525 

Val Ser Ser Ser Glu Arg His Ser Ser Ser Asp Glu Gly Ser Gly Lys 
530 535 540 

Ser Lys His Ser Ser Pro His lie Ser Arg Asp His Lys Glu Lys His 
545 550 555 560 

Lys Glu His Pro Ser Ser Arg His His Thr Ser Ser His Lys His Ser 
565 570 575 

His Ser His Ser Gly Ser Ser Ser Gly Gly Ser Lys His Ser Ala Asp 
580 585 590 

Gly lie Pro Pro Thr Val Leu Arg Ser Pro Val Gly Leu Ser Ser Asp 
595 600 605 

Gly lie Ser Ser Ser Ser Ser Ser Ser Arg Lys Arg Leu His Val Asn 
610 615 620 

Asp Ala Ser His Asn His His Ser Lys Met Ser Lys Ser Ser Lys Ser 
625 630 635 640 

Ser Gly Ser Ser Ser Ser Ser Ser Ser Ser Val Lys Gin Tyr lie Ser 
645 650 655 

Ser His Asn Ser Val Phe Asn His Pro Leu Pro Leu Leu Pro Cys His 
660 665 670 



lie Pro Gly Gly Leu Arg Thr Ser Gin His Leu Val Lys Leu Asp Lys 



675 680 685 

Lys Pro Val Glu Thr Asn Gly Pro Asp Ala Asn His Glu Tyr Ser Thr 
690 695 700 

Ser Ser Gin His Met Asp Tyr Lys Asp Thr Phe Asp Met Leu Asp Ser 
705 710 715 720 

Leu Leu Ser Ala Gin Gly Met Asn Met 
725 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2360 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 

GGAAGTGCCT GCAACCTTCG CCGCTGCCTT CTGGTTGAAG CACTATGGAG GGAGAGAGGA 60 

AGAACAACAA CAAACGGTGG TATTTCACTC GAGAACAGCT GGAAAATAGC CCATCCCGTC 120 

GTTTTGGCGT GGACCCAGAT AAAGAACTTT CTTATCGCCA GCAGGCGGCC AATCTGCTTC 180 

AGGACATGGG GCAGCGTCTT AACGTCTCAC AATTGACTAT CAACACTGCT ATAGTATACA 240 

TGCATCGATT CTACATGATT CAGTCCTTCA CACGGTTCCC TGGAAATTCT GTGGCTCCAG 300 

CAGCCTTGTT TCTAGCAGCT AAAGTGGAGG AGCAGCCCAA AAAATTGGAA CATGTCATCA 360 

AGGTAGCACA TACTTGTCTC CATCCTCAGG AATCCCTTCC TGATACTAGA AGTGAGGCTT 42 0 

ATTTGCAACA AGTTCAAGAT CTGGTCATTT TAGAAAGCAT AATTTTGCAG ACTTTAGGCT 480 

TTGAACTAAC AATTGATCAC CCACATACTC ATGTAGTAAA GTGCACTCAA CTTGTTCGAG 540 

CAAGCAAGGA CTTAGCACAG ACTTCTTACT TCATGGCAAC CAACAGCCTG CATTTGACCA 600 

CATTTAGCCT GCAGTACACA CCTCCTGTGG TGGCCTGTGT CTGCATTCAC CTGGCTTGCA 660 

AGTGGTCCAA TTGGGAGATC CCAGTCTCAA CTGACGGGAA GCACTGGTGG GAGTATGTTG 720 

ACGCCACTGT GACCTTGGAA CTTTTAGATG AACTGACACA TGAGTTTCTA CAGATTTTGG 780 

AGAAAACTCC CAACAGGCTC AAACGCATTT GGAATTGGAG GGCATGCGAG GCTGCCAAGA 840 

AAACAAAAGC AGATGACCGA GGAACAGATG AAAAGACTTC AGAGCAGACA ATCCTCAATA 900 

TGATTTCCCA GAGCTCTTCA GACACAACCA TTGCAGGTTT AATGAGCATG TCAACTTCTA 96 0 



CCACAAGTGC AGTGCCTTCC CTGCCAGTCT CCGAAGAGTC ATCCAGCAAC TTAACCAGTG 1020 

TGGAGATGTT GCCGGGCAAG CGTTGGCTGT CCTCCCAACC TTCTTTCAAA CTAGAACCTA 1080 

CTCAGGGTCA TCGGACTAGT GAGAATTTAG CACTTACAGG AGTTGATCAT TCCTTACCAC 1140 

AGGATGGTTC AAATGCATTT ATTTCCCAGA AGCAGAATAG TAAGAGTGTG CCATCAGCTA 1200 

AAGTGTCACT GAAAGAATAC CGCGCGAAGC ATGCAGAAGA ATTGGCTGCC CAGAAGAGGC 1260 

AACTGGAGAA CATGGAAGCC AATGTGAAGT CACAATATGC ATATGCTGCC CAGAATCTCC 132 0 

TTTCTCATCA TGATAGC CAT TCTTCAGTCA TTCTAAAAAT GCCCATAGAG GGTTCAGAAA 1380 

ACCCCGAGCG GCCTTTTCTG GAAAAGGCTG ACAAAACAGC TCTCAAAATG AGAATCCCAG 1440 

TGGCAGGTGG AGATAAAGCT GCGTCTTCAA AACCAGAGGA GATAAAAATG CGCATAAAAG 1500 

TCCATGCTGC AGCTGATAAG CACAATTCTG TAGAGGACAG TGTTACAAAG AGCCGAGAGC 1560 

ACAAAGAAGA GCGCAAGACT CACCCATCTA ATCATCATCA TCATCATAAT CACCACTCAC 1620 

ACAAGCACTC TCATTCCCAA CTTCCAGTTG GTACTGGGAA CAAACGTCCT GGTGATCCAA 1680 

AACATAGTAG CCAGACAAGC AACTTAGCAC AT AAAAC CT A TAGCTTGTCT AGTTCTTTTT 1740 

CCTCTTCCAG TTCTACTCGT AAAAGGGGAC CCTCTGAAGA GACTGGAGGG GCTGTGTTTG 1800 

ATCATCCAGC CAAGATTGCC AAGAGTACTA AATCCTCTTC CCTAAATTTC TCCTTCCCTT 1860 

CACTTCCTAC AATGGGTCAG ATGCCTGGGC ATAGCTCAGA CACAAGTGGC CTTTCCTTTT 1920 

CACAGCCCAG CTGTAAAACT CGTGTCCCTC ATTCGAAACT GGATAAAGGG CCCACTGGGG 1980 

CCAATGGTCA CAACACGACC CAGACAATAG ACTATCAAGA CACTGTGAAT ATGCTTCACT 204 0 

CCCTGCTCAG TGCCCAGGGT GTTCAGCCCA CTCAGCCCAC TGCATTTGAA TTTGTTCGTC 2100 

CTTATAGTGA CTATCTGAAT CCTCGGTCTG GTGGAATCTC CTCGAGATCT GGCAATACAG 2160 

ACAAACCCCG GCCACCACCT CTGCCATCAG AACCTCCTCC ACCACTTCCA CCCCTTCCTA 2220 

AGTAAAAAAA GAAAAAGAAG AGGAGAAAAA AACTTCTTTA AAAAAACACA TAATTTTTCT 2280 

TTTTTTTTTG GGGAAAAAAA AATTTTTTTT AAAATTTTTT CCCCAAGGGA CGGGGGAAAA 2340 

TTTTATTTTT AAAATTTTTT 2360 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2181 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 

ATGGAGGGAG AGAGGAAGAA CAACAACAAA CGGTGGTATT TCACTCGAGA ACAGCTGGAA 60 

AATAGCCCAT CCCGTCGTTT TGGCGTGGAC CCAGATAAAG AACTTTCTTA TCGCCAGCAG 120 

GCGGCCAATC TGCTTCAGGA CATGGGGCAG CGTCTTAACG TCTCACAATT GACTATCAAC 180 

ACTGCTATAG TATACATGCA TCGATTCTAC ATGATTCAGT CCTTCACACG GTTCCCTGGA 240 

AATTCTGTGG CTCCAGCAGC CTTGTTTCTA GCAGCTAAAG TGGAGGAGCA GCCCAAAAAA 300 

TTGGAACATG TCATCAAGGT AGCACATACT TGTCTCCATC CTCAGGAATC CCTTCCTGAT 360 

ACTAGAAGTG AGGCTTATTT GCAACAAGTT CAAGATCTGG TCATTTTAGA AAGCATAATT 420 

TTGCAGACTT TAGGCTTTGA ACTAACAATT GATCACCCAC ATACTCATGT AGTAAAGTGC 480 

ACTCAACTTG TTCGAGCAAG CAAGGACTTA GCACAGACTT CTTACTTCAT GGCAACCAAC 54 0 

AGCCTGCATT TGACCACATT TAGCCTGCAG TACACACCTC CTGTGGTGGC CTGTGTCTGC 600 

ATTCACCTGG CTTGCAAGTG GTCCAATTGG GAGATCCCAG TCTCAACTGA CGGGAAGCAC 660 

TGGTGGGAGT ATGTTGACGC CACTGTGACC TTGGAACTTT TAGATGAACT GACACATGAG 720 

TTTCTACAGA TTTTGGAGAA AACTCCCAAC AGGCTCAAAC GCATTTGGAA TTGGAGGGCA 780 

TGCGAGGCTG CCAAGAAAAC AAAAGCAGAT GACCGAGGAA CAGATGAAAA GACTTCAGAG 840 

CAGACAATCC TCAATATGAT TTCCCAGAGC TCTTCAGACA CAACCATTGC AGGTTTAATG 900 

AGCATGTCAA CTTCTACCAC AAGTGCAGTG CCTTCCCTGC CAGTCTCCGA AGAGTCATCC 960 

AGCAACTTAA CCAGTGTGGA GATGTTGCCG GGCAAGCGTT GGCTGTCCTC CCAACCTTCT 1020 

TTCAAACTAG AACCTACTCA GGGTCATCGG ACTAGTGAGA ATTTAGCACT TACAGGAGTT 1080 

GATCATTCCT TACCACAGGA TGGTTCAAAT GCATTTATTT CCCAGAAGCA GAATAGTAAG 1140 

AGTGTGCCAT CAGCTAAAGT GTCACTGAAA GAATACCGCG CGAAGCATGC AGAAGAATTG 1200 

GCTGCCCAGA AGAGGCAACT GGAGAACATG GAAGCCAATG TGAAGTCACA ATATGCATAT 1260 

GCTGCCCAGA ATCTCCTTTC TCATCATGAT AGCCATTCTT CAGTCATTCT AAAAATGCCC 1320 

ATAGAGGGTT CAGAAAACCC CGAGCGGCCT TTTCTGGAAA AGGCTGACAA AACAGCTCTC 138 0 

AAAATGAGAA TCCCAGTGGC AGGTGGAGAT AAAGCTGCGT CTTCAAAACC AGAGGAGATA 1440 

AAAATGCGCA TAAAAGTCCA TGCTGCAGCT GATAAGCACA ATTCTGTAGA GGACAGTGTT 1500 



AC AAAGAGC C GAGAGCACAA AGAAGAGCGC AAGACTCACC CAT CTAATCA TCATCATCAT 1560 

CATAATCACC ACTCACACAA GCACTCTCAT TCCCAACTTC CAGTTGGTAC TGGGAACAAA 1620 

CGTCCTGGTG ATC CAAAAC A TAGTAGCCAG ACAAGCAACT TAGCACATAA AACCTATAGC 1680 

TTGTCTAGTT CTTTTTCCTC TTCCAGTTCT ACTCGTAAAA GGGGACCCTC TGAAGAGACT 1740 

GGAGGGGCTG TGTTTGATCA TCCAGCCAAG ATTGCCAAGA GTACTAAATC CTCTTCCCTA 1800 

AATTTCTCCT TCCCTTCACT TCCTACAATG GGTCAGATGC CTGGGCATAG CTCAGACACA 1860 

AGTGGCCTTT CCTTTTCACA GCCCAGCTGT AAAACTCGTG TCCCTCATTC GAAACTGGAT 1920 

AAAGGGCCCA CTGGGGCCAA TGGTCACAAC ACGACCCAGA CAATAGACTA TCAAGACACT 1980 

GTGAATATGC TTCACTCCCT GCTCAGTGCC CAGGGTGTTC AGCCCACTCA GCCCACTGCA 2040 

TTTGAATTTG TTCGTCCTTA TAGTGACTAT CTGAATCCTC GGTCTGGTGG AATCTCCTCG 2100 

AGATCTGGCA ATACAGACAA ACCCCGGCCA CCACCTCTGC CATC AGAAC C TCCTCCACCA 2160 

CTTCCACCCC TTC CTAAGTA A 2181 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 726 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: 

Met Glu Gly Glu Arg Lys Asn Asn Asn Lys Arg Trp Tyr Phe Thr Arg 
15 10 15 

Glu Gin Leu Glu Asn Ser Pro Ser Arg Arg Phe Gly Val Asp Pro Asp 



20 



25 



30 



Lys 



Glu Leu Ser Tyr Arg Gin Gin Ala Ala Asn Leu Leu Gin Asp Met 
35 40 45 



Gly 



Gin Arg Leu Asn Val Ser Gin Leu Thr lie Asn Thr Ala lie Val 
50 55 60 



Tyr 
65 



Met His Arg Phe Tyr Met lie Gin Ser Phe Thr Arg Phe Pro Gly 
70 75 80 



Asn 



Ser Val Ala Pro Ala Ala Leu Phe Leu Ala Ala Lys Val Glu Glu 
85 90 95 



Gin Pro Lys Lys Leu Glu His Val He Lys Val Ala His Thr Cys Leu 
100 105 HO 



His Pro Gin Glu Ser Leu Pro Asp Thr Arg Ser Glu Ala Tyr Leu Gin 
115 120 125 

Gin Val Gin Asp Leu Val He Leu Glu Ser He He Leu Gin Thr Leu 
130 135 140 

Gly Phe Glu Leu Thr He Asp His Pro His Thr His Val Val Lys Cys 
145 150 155 160 

Thr Gin Leu Val Arg Ala Ser Lys Asp Leu Ala Gin Thr Ser Tyr Phe 
165 170 175 

Met Ala Thr Asn Ser Leu His Leu Thr Thr Phe Ser Leu Gin Tyr Thr 
180 185 190 

Pro Pro Val Val Ala Cys Val Cys He His Leu Ala Cys Lys Trp Ser 
195 200 205 

Asn Trp Glu He Pro Val Ser Thr Asp Gly Lys His Trp Trp Glu Tyr 
210 215 220 

Val Asp Ala Thr Val Thr Leu Glu Leu Leu Asp Glu Leu Thr His Glu 
225 230 235 240 

Phe Leu Gin He Leu Glu Lys Thr Pro Asn Arg Leu Lys Arg He Trp 
245 250 255 

Asn Trp Arg Ala Cys Glu Ala Ala Lys Lys Thr Lys Ala Asp Asp Arg 
260 265 270 

Gly Thr Asp Glu Lys Thr Ser Glu Gin Thr He Leu Asn Met He Ser 
275 280 285 

Gin Ser Ser Ser Asp Thr Thr He Ala Gly Leu Met Ser Met Ser Thr 
290 295 300 

Ser Thr Thr Ser Ala Val Pro Ser Leu Pro Val Ser Glu Glu Ser Ser 
305 310 315 320 

Ser Asn Leu Thr Ser Val Glu Met Leu Pro Gly Lys Arg Trp Leu Ser 
325 330 335 

Ser Gin Pro Ser Phe Lys Leu Glu Pro Thr Gin Gly His Arg Thr Ser 
340 345 350 

Glu Asn Leu Ala Leu Thr Gly Val Asp His Ser Leu Pro Gin Asp Gly 
355 360 365 



Ser Asn Ala Phe He Ser Gin Lys Gin Asn Ser Lys Ser Val Pro Ser 
370 375 380 



Ala Lys Val Ser 
385 

Ala Ala Gin Lys 



Gin Tyr Ala Tyr 
420 

Ser Ser Val lie 
435 

Arg Pro Phe Leu 
450 



Leu Lys Glu Tyr 
390 

Arg Gin Leu Glu 
405 

Ala Ala Gin Asn 



Leu Lys Met Pro 
440 

Glu Lys Ala Asp 
455 



Arg Ala Lys His 
395 

Asn Met Glu Ala 
410 

Leu Leu Ser His 
425 

lie Glu Gly Ser 



Lys Thr Ala Leu 
460 



Ala Glu Glu Leu 
400 

Asn Val Lys Ser 
415 

His Asp Ser His 
430 

Glu Asn Pro Glu 
445 

Lys Met Arg lie 



Pro Val Ala Gly 
465 

Lys Met Arg lie 



Glu Asp Ser Val 
500 

His Pro Ser Asn 
515 

Ser His Ser Gin 
530 

Pro Lys His Ser 
545 



Gly Asp Lys Ala 
470 

Lys Val His Ala 
485 

Thr Lys Ser Arg 



His His His His 
520 

Leu Pro Val Gly 
535 

Ser Gin Thr Ser 
550 



Ala Ser Ser Lys 
475 

Ala Ala Asp Lys 
490 

Glu His Lys Glu 
505 

His Asn His His 



Thr Gly Asn Lys 
540 

Asn Leu Ala His 
555 



Pro Glu Glu lie 
480 

His Asn Ser Val 
495 

Glu Arg Lys Thr 
510 

Ser His Lys His 
525 

Arg Pro Gly Asp 



Lys Thr Tyr Ser 
560 



Leu Ser Ser Ser 



Ser Glu Glu Thr 
580 

Lys Ser Thr Lys 
595 

Thr Met Gly Gin 
610 

Phe Ser Gin Pro 
625 

Lys Gly Pro Thr 



Tyr Gin Asp Thr 
660 



Phe Ser Ser Ser 
565 

Gly Gly Ala Val 



Ser Ser Ser Leu 
600 

Met Pro Gly His 
615 

Ser Cys Lys Thr 
630 

Gly Ala Asn Gly 
645 

Val Asn Met Leu 



Ser Ser Thr Arg 
570 

Phe Asp His Pro 
585 

Asn Phe Ser Phe 



Ser Ser Asp Thr 
620 

Arg Val Pro His 
635 

His Asn Thr Thr 
650 

His Ser Leu Leu 
665 



Lys Arg Gly Pro 
575 

Ala Lys lie Ala 
590 

Pro Ser Leu Pro 
605 

Ser Gly Leu Ser 



Ser Lys Leu Asp 
640 

Gin Thr lie Asp 
655 

Ser Ala Gin Gly 
670 



Val Gin Pro Thr Gin Pro Thr Ala Phe Glu Phe Val Arg Pro Tyr Ser 
675 680 685 



Asp Tyr Leu Asn Pro Arg Ser Gly Gly He Ser Ser Arg Ser Gly Asn 
690 695 700 

Thr Asp Lys Pro Arg Pro Pro Pro Leu Pro Ser Glu Pro Pro Pro Pro 
705 710 715 720 

Leu Pro Pro Leu Pro Lys 
725 



(2) INFORMATION FOR SEQ ID NO: 51: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
TTCCCACCAA TGCTTTCC 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
CCATCAGTTG ATACAGGGAT CT 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
GGAATTCAGA AGGTTGTAAG ATGC 



(2) INFORMATION FOR SEQ ID NO: 54: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 54 : 
ACACACAGAT GTGGTGAAAT GTACCCA 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GCATCTTACA ACCTTCTG 



(2) INFORMATION FOR SEQ ID NO:56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56 
GGAATTCATG GAAAGCATTG GTGGGAAT 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57 
CCTCCACTAC TGGTTTGCCT GG 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
GGACTAGTAT AAATATGGCG TCGGGCCGTG 



(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GGAGATCTTA CATGTTCATT CCTTGGG 



(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
GGAGACAAGT ATGTGCTACC TTGATGACA 



(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GGAATTCGGG CTGCTCCTCC ACTTTAG 



(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 



(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62 
GGAATTCGCT GCTGGAGCCA CAGAA 



(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63 
GTGTCACTGA AAGAATACCG 



(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64 
GGAATTCAGG TGGAGATAAA GCTGC 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65 
GCTCTAGATA AATATGGAGG GAGAGAGGAA 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: 
GGAATTCTTA CTTAGGAAGG GGTGGAAGTG 30 



(2) INFORMATION FOR SEQ ID NO:67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GGAATTCTTA CTTAGGAAGG GGTGGAAGTG GTGGAGGAGG TTAC 44 



(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 

Ala Cys Ser Tyr Ser Pro Thr Ser Pro Ser Tyr Ser Pro Thr Ser Pro 
15 10 15 



Ser Tyr Ser Pro Thr Ser Pro Ser Lys Lys 
20 25 



